Jump to Content

Learning Interactive Real-World Simulator

Published
View publication Download

Abstract

Generative models trained on internet data have revolutionized how text, image, and video content can be created. Perhaps the next milestone for generative models is to simulate the real world in response to actions carried out by humans, robots, and other types of interactive agents. Applications of a real-world simulator range from controllable content creation in games and movies to training embodied agents purely in simulation that can be directly deployed in the real world. In this paper, we explore these possibilities around learning a universal simulator (UniSim) of real-world interactions through generative modeling. We first make the important observation that natural datasets available for learning a real-world simulator are often rich in different axes (e.g., rich labeled objects in image data, rich actions in robotics data, and rich movements in navigation data). With careful orchestration of diverse datasets each serving a different piece of the puzzle, UniSim can emulate how humans and agents interact with the world by simulating the visual outcome of both high-level instructions such as “open the drawer” and low-level controls such as “move to x, y location” from otherwise static scenes and objects. The usage of a real-world simulator is vast. As an example, we use UniSim to simulate interactive experiences to train high-level vision-language planners and low-level reinforcement learning policies, both of which exhibit significant real-world transfer from purely training in a real-world like simulator. Lastly, we show that other types of intelligence such as video captioning and detection models can also benefit from simulated experiences of UniSim, opening up even wider applications of a real-world simulator.

Authors

Sherry Yang, Kamyar Ghasemipour, Jonathan Tompson, Dale Schuurmans, Pieter Abbeel*, Yilun Du*

Venue

ICLR 2024