Towards Real Robot Learning in the Wild: A Case Study in Bipedal Locomotion


Algorithms for self-learning systems have made considerable progress in recent years, yet safety concerns and the need for additional instrumentation have so far largely limited learning experiments with real robots to well controlled lab settings. In this paper we demonstrate how a bipedal robot can autonomously learn to walk with minimal human intervention and minimal instrumentation of the environment. We employ data-efficient off-policy reinforcement learning to learn to walk end-to-end, from scratch, using rewards that are computed exclusively from proprioceptive sensing. To allow the robot to autonomously adapt to its environment we provide the learning agent with raw RGB camera images. By training multiple robots in different geographic locations while sharing data in a distributed learning setup, we achieve higher throughput and greater diversity of the training data. This leads to faster learning and more robust policies. Our learning experiments constitute a step towards the long-term vision of learning ``in the wild'' for legged robots.