DeepMind x UCL
This classic 10 part course, taught by Reinforcement Learning (RL) pioneer David Silver, was recorded in 2015 and remains a popular resource for anyone wanting to understand the fundamentals of RL.
Reinforcement Learning has emerged as a powerful technique in modern machine learning, allowing a system to learn through a process of trial and error. It has been succesfully applied in many domains, including systems such as AlphaZero, that learnt to master the games of chess, Go and Shogi.
This lecture series, taught at University College London by David Silver - DeepMind Principal Scienctist, UCL professor and the co-creator of AlphaZero - will introduce students to the main methods and techniques used in RL. Students will also find Sutton and Barto’s classic book, Reinforcement Learning: an Introduction a helpful companion.
Explore the concepts and methods used in modern reinforcement learning research.
Lecture 1: Introduction to Reinforcement Learning
Introduces reinforcment learning (RL), an overview of agents and some classic RL problems.
Lecture 2: Markov Decision Processes
Explores Markov Processes including reward processes, decision processes and extensions.
Lecture 3: Planning by Dynamic Programming
Introduces policy evaluation and iteration, value iteration, extensions to dynamic programming and contraction mapping.
Lecture 4: Model-Free Prediction
An introduction to Monte-Carlo Learning and Temporal Difference Learning
Lecture 5: Model-Free Control
Dives into On Policy Monte-Carlo Control and Temporal Difference Learning, as well as Off-Policy Learning.
Lecture 6: Value Function Approximation
A deep dive into incremental methods and batch methods of value function approximation.
Lecture 7: Policy Gradient Methods
Looks at different policy gradients, including Finite Difference, Monte-Carlo and Actor Critic.
Lecture 8: Integrating Learning and Planning
Introduces model-based RL, along with integrated architectures and simulation based search.
Lecture 9: Exploration and Exploitation
An overview of multi-armed bandits, contextual bandits and Markov Decision Processes.
Lecture 10: Case Study: RL in Classic Games
An overview of Game Theory, minimax search, self-play and imperfect information games.