Reinforcement Learning Lecture Series 2021

DeepMind x UCL

Taught by DeepMind researchers, this series was created in collaboration with University College London (UCL) to offer students a comprehensive introduction to modern reinforcement learning.

Comprising 13 lectures, the series covers the fundamentals of reinforcement learning and planning in sequential decision problems, before progressing to more advanced topics and modern deep RL algorithms. It gives students a detailed understanding of various topics, including Markov Decision Processes, sample-based learning algorithms (e.g. (double) Q-learning, SARSA), deep reinforcement learning, and more. It also explores more advanced topics like off-policy learning, multi-step updates and eligibility traces, as well as conceptual and practical considerations in implementing deep reinforcement learning algorithms such as rainbow DQN.

Read more:


DeepMind Research Scientists and Engineer Hado van Hasselt, Diana Borsa & Matteo Hessel lead a 13-part self-contained introduction to RL and deep RL, aimed at Master's students and above.

Lecture 1: Introduction to Reinforcement Learning

Research Scientist Hado van Hasselt introduces the reinforcement learning course and explains how reinforcement learning relates to AI.

Lecture 2: Exploration & Control

Research Scientist Hado van Hasselt looks at why it's important for learning agents to balance exploring and exploiting acquired knowledge at the same time.

Lecture 3: MDPs & Dynamic Programming

Research Scientist Diana Borsa explains how to solve MDPs with dynamic programming to extract accurate predictions and good control policies.

Lecture 4: Theoretical Fundamentals of Dynamic Programming Algorithms

Research Scientist Diana Borsa explores dynamic programming algorithms as contraction mappings, looking at when and how they converge to the right solutions.

Lecture 5: Model-free Prediction

Research Scientist Hado van Hasselt takes a closer look at model-free prediction and its relation to Monte Carlo and temporal difference algorithms.

Lecture 6: Model-free Control

Research Scientist Hado van Hasselt covers prediction algorithms for policy improvement, leading to algorithms that can learn good behaviour policies from sampled experience.

Lecture 7: Function Approximation

Research Scientist Hado van Hasselt explains how to combine deep learning with reinforcement learning for "deep reinforcement learning".

Lecture 8: Planning & models

Research Engineer Matteo Hessel explains how to learn and use models, including algorithms like Dyna and Monte-Carlo tree search (MCTS).

Lecture 9: Policy-Gradient & Actor-Critic methods

Research Scientist Hado van Hasselt covers policy algorithms that can learn policies directly and actor critic algorithms that combine value predictions for more efficient learning.

Lecture 10: Approximate Dynamic Programming

Research Scientist Diana Borsa introduces approximate dynamic programming, exploring what we can say theoretically about the performance of approximate algorithms.

Lecture 11: Multi-step & Off Policy

Research Scientist Hado van Hasselt discusses multi-step and off policy algorithms, including various techniques for variance reduction.

Lecture 12: Deep Reinforcement Learning #1

Research Engineer Matteo Hessel talks practical considerations and algorithms for deep RL, including how to implement these using auto-differentiation (i.e Jax).

Lecture 13: Deep Reinforcement Learning #2

Research Engineer Matteo Hessel covers general value functions, GVFs as auxiliary tasks, and explains how to deal with scaling issues in algorithms.