Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion


Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to truly make good on this promise, however, we need algorithms and learning setups that can work across a broad range of problems with minimal problem specific adjustments or engineering.

In this paper, we study this idea of generality in the locomotion domain. We develop a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants. Our learning framework relies on a data-efficient, off-policy multi-task RL algorithm and a small set of reward functions that are semantically identical across robots.

To underline the general applicability of the method, we keep the hyper-parameter settings and reward definitions constant across experiments and rely exclusively on on-board sensing. For nine different types of robots, including a real-world quadruped robot, we demonstrate that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup.

Authors' Notes
The controller is applied to 9 different locomotion platforms.
Our controller is applied to 9 different locomotion platforms (including a real quadruped), ranging from six-legged robots to bipedal robots with skates. Across all platforms, the agent used the same hyperparameters and reward functions. Controller input and reward giving is derived solely from on-board sensing. Our data-efficient learning framework was able to learn controllers with multiple skills very quickly, with data corresponding to a couple of hours of real time equivalent.
Controller module switching between various core skills learned from scratch. The multi-task learning framework Scheduled Auxiliary Control (SAC-X) concurrently learns various skills while keeping training time low via its data-sharing capabilities. It is naturally able to smoothly transition between the skills.
The 4 legged robot ‘Daisy-4’. The agent learned to walk forward in about 2 hours (including time for resets). This corresponds roughly to the time that the simulated counterpart used for training. Also, multi-skill controllers that walk forward, walk backward and lift single legs were successfully trained in a couple of hours.

Creature performance