Introduction to Reinforcement Learning with David Silver

DeepMind x UCL

This classic 10 part course, taught by Reinforcement Learning (RL) pioneer David Silver, was recorded in 2015 and remains a popular resource for anyone wanting to understand the fundamentals of RL.

Reinforcement Learning has emerged as a powerful technique in modern machine learning, allowing a system to learn through a process of trial and error. It has been succesfully applied in many domains, including systems such as AlphaZero, that learnt to master the games of chess, Go and Shogi.

This lecture series, taught at University College London by David Silver - DeepMind Principal Scienctist, UCL professor and the co-creator of AlphaZero - will introduce students to the main methods and techniques used in RL. Students will also find Sutton and Barto’s classic book, Reinforcement Learning: an Introduction a helpful companion.


Explore the concepts and methods used in modern reinforcement learning research.

Lecture 1: Introduction to Reinforcement Learning

Introduces reinforcment learning (RL), an overview of agents and some classic RL problems.

Lecture 2: Markov Decision Processes

Explores Markov Processes including reward processes, decision processes and extensions.

Lecture 3: Planning by Dynamic Programming

Introduces policy evaluation and iteration, value iteration, extensions to dynamic programming and contraction mapping.

Lecture 4: Model-Free Prediction

An introduction to Monte-Carlo Learning and Temporal Difference Learning

Lecture 5: Model-Free Control

Dives into On Policy Monte-Carlo Control and Temporal Difference Learning, as well as Off-Policy Learning.

Lecture 6: Value Function Approximation

A deep dive into incremental methods and batch methods of value function approximation.

Lecture 7: Policy Gradient Methods

Looks at different policy gradients, including Finite Difference, Monte-Carlo and Actor Critic.

Lecture 8: Integrating Learning and Planning

Introduces model-based RL, along with integrated architectures and simulation based search.

Lecture 9: Exploration and Exploitation

An overview of multi-armed bandits, contextual bandits and Markov Decision Processes.

Lecture 10: Case Study: RL in Classic Games

An overview of Game Theory, minimax search, self-play and imperfect information games.