Sparse reward tasks to learn behaviour priors
Sparse reward multi-task domains that require a common set of behaviors to solve. The tasks are setup as 'True/False' predicates which are used to provide reward signal. For instance, going to a target or moving a box to a target can be used to encourage certain behaviors within agents. These tasks have been used with "Behavior Priors for Efficient Reiforcement Learning" (https://arxiv.org/abs/2010.14274), "Exploiting Hierarchy for Learning and Transfer in KL-Regularized RL" (https://arxiv.org/abs/2010.14274) and "Information asymmetry in KL-regularized RL" (https://arxiv.org/abs/1905.01240).