Reinforcement Learning Lecture Series 2018

DeepMind x UCL

This lecture series, taught by DeepMind Research Scientist Hado van Hasselt and done in collaboration with University College London (UCL), offers students a comprehensive introduction to modern reinforcement learning.

Comprising 10 lectures, it covers fundamentals, such as learning and planning in sequential decision problems, before progressing to more advanced topics and modern deep RL algorithms. This series will give students a detailed understanding of topics, including Markov Decision Processes, sample-based learning algorithms (e.g. Q-learning, SARSA), deep reinforcement learning, model-based reinforcement learning and planning (including Dyna), policy gradient algorithms and actor-critic methods. It also explores more advanced topics such as multi-step updates, double Q-learning and recent algroithms such as rainbow DQN.

The course is concluded by two guest lectures led by DeepMind Research Scientists Volodymyr Mnih and David Silver. Students might also enjoy the  Deep Learning lecture series or the Coursera Specialisation on Reinforcment Learning taught by University of Alberta's Martha White and her colleague and DeepMind Research Scientist Adam White. 

Suggested further reading: Reinforcement Learning: An introduction by Sutton and Barto.


Research Scientist Hado van Hasselt leads a 10-part self-contained introduction to RL and deep RL, aimed at Master's students and above.

Lecture 1: Introduction to Reinforcement Learning

Hado shares an introduction to reinforcement learning, including an overview of core concepts and agent components.

Lecture 2: Exploration and Exploitation

Discusses the trade-off between exploration and exploitation and introduces key concepts such as multi-armed bandits.

Lecture 3: Markov Decision Processes and Dynamic Programming

Explores the theory of how agents interact with their environment, known as Markov Decision Processses (MDP).

Lecture 4: Model-Free Prediction and Control

A deep dive into how model-free prediction and control can be used to estimate and optimise values in MDPs.

Lecture 5: Function Approximation and Deep Reinforcement Learning

A deep dive into how agents can use function approximation to determine a policy, value function or model.

Lecture 6: Policy Gradients and Actor Critics

Examines policy based RL and how an agency can learn a policy directly from experience.

Lecture 7: Planning and Models

Explores how agents can learn a model directly from experience and then use it to plan and construct a value function or policy.

Lecture 8: Advanced Topics in Deep RL

An overview of open research questions including exploration, credit assignment and sample efficient learning.

Lecture 9: A Brief Tour of Deep RL Agents

Research Scientist Volodymyr Mnih, one of the team behind DQN, gives an overview of deep reinforcement learning agents.

Lecture 10: Classic Games Case Study

David Silver, the co-creator of AlphaZero and AlphaStar, gives an overview of RL and its application to classic games.