On the role of planning in model-based deep reinforcement learning


Planning and model-based reasoning are often thought to support deep, careful reasoning and generalization in artificial agents. Yet, with the proliferation of many different approaches in model-based reinforcement learning (MBRL), it is unclear which components of these algorithms drive behavior. In this paper, we ask three questions: why is planning useful for RL agents, what design choices contribute most to performance, and to what extent does planning assist in generalization? To answer these questions, we evaluate a recent state-of-the-art algorithm, MuZero (Schrittwieser et al., 2019), across a wide range of environments, including continuous control tasks, Atari, and strategic games. Our results suggest that the contribution of planning is primarily in driving learning of a policy; that it may be sufficient to plan using shallow trees and simple rollouts; and that while planning can also make up for some function approximation error, further research is needed to better understand its relationship to generalization. Our results have both important practical implications regarding when planning is likely to help in RL, as well as theoretical implications about the importance of planning in generalization.