Temporal Abstraction in Reinforcement Learning with the Successor Representation


Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for characterizing which options one should consider. In this paper, we argue that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. In order to support our claim, we discuss existing methods based on the successor representation, providing a big-picture view at how it can be used in the options framework and connecting the main results in the field. After formally discussing the successor representation for reinforcement learning, we discuss eigenoptions and covering options, which leverage this type of representation to perform temporally-extended exploration. Every discovery method, however effective, is subject to a trade-off that is inherent to the use of options: if on the one hand more options means a more expressive set of behaviours, on the other hand this generally makes it more difficult to learn and use the options. To address this point, we also discuss the option keyboard, which uses the successor representation to extend a finite set of options to a combinatorially large counterpart without additional learning. Besides helping in dealing with the trade-off mentioned above, the option keyboard also lifts a well-known limitation of option-based methods, namely, the inability to handle more than one skill at a time. We then demonstrate the natural synergy between eigenoptions and the option keyboard. This combination drastically reduces the number of options an agent needs to have access to in order to be able to generate diverse and effective behaviours. We conclude discussing how the successor representation can also be used to discover options that are effective for planning and we outline recent results suggesting the successor representation is also encoded in the brain.