Mastering the strategy, tactical understanding, and team play involved in multiplayer video games represents a critical challenge for AI research. Now, through new developments in reinforcement learning, our agents have achieved human-level performance in Quake III Arena Capture the Flag, a complex multi-agent environment and one of the canonical 3D first-person multiplayer games. These agents demonstrate the ability to team up with both artificial agents and human players.
Above: four of our trained agents play together on an indoor and outdoor procedurally generated Capture the Flag level.
Billions of people inhabit the planet, each with their own individual goals and actions, but still capable of coming together through teams, organisations and societies in impressive displays of collective intelligence. This is a setting we call multi-agent learning: many individual agents must act independently, yet learn to interact and cooperate with other agents. This is an immensely difficult problem - because with co-adapting agents the world is constantly changing.
To investigate this problem we look at 3D first-person multiplayer video games. These games represent the most popular genre of video game, and have captured the imagination of millions of gamers because of their immersive game play, as well as the challenges they pose in terms of strategy, tactics, hand-eye coordination, and team play. The challenge for our agents is to learn directly from raw pixels to produce actions. This complexity makes first-person multiplayer games a fruitful and active area of research within the AI community.
The game we focus on in this work is Quake III Arena (which we aesthetically modified, though all game mechanics remain the same). Quake III Arena has laid the foundations for many modern first-person video games, and has attracted a long-standing competitive esports scene. We train agents that learn and act as individuals, but which must be able to play on teams with and against any other agents, artificial or human.
The rules of CTF are simple, but the dynamics are complex. Two teams of individual players compete on a given map with the goal of capturing the opponent team’s flag while protecting their own. To gain tactical advantage they can tag the opponent team members to send them back to their spawn points. The team with the most flag captures after five minutes wins.
From a multi-agent perspective, CTF requires players to both successfully cooperate with their teammates as well as compete with the opposing team, while remaining robust to any playing style they might encounter.
To make things even more interesting, we consider a variant of CTF in which the map layout changes from match to match. As a consequence, our agents are forced to acquire general strategies rather than memorising the map layout. Additionally, to level the playing field, our learning agents experience the world of CTF in a similar way to humans: they observe a stream of pixel images and issue actions through an emulated game controller.
Our agents must learn from scratch how to see, act, cooperate, and compete in unseen environments, all from a single reinforcement signal per match: whether their team won or not. This is a challenging learning problem, and its solution is based on three general ideas for reinforcement learning:
- Rather than training a single agent, we train a population of agents, which learn by playing with each other, providing a diversity of teammates and opponents.
- Each agent in the population learns its own internal reward signal, which allows agents to generate their own internal goals, such as capturing a flag. A two-tier optimisation process optimises agents’ internal rewards directly for winning, and uses reinforcement learning on the internal rewards to learn the agents’ policies.
- Agents operate at two timescales, fast and slow, which improves their ability to use memory and generate consistent action sequences.
The resulting agent, dubbed the For The Win (FTW) agent, learns to play CTF to a very high standard. Crucially, the learned agent policies are robust to the size of the maps, the number of teammates, and the other players on their team. Below, you can explore some games on both the outdoor procedural environments, where FTW agents play against each other, as well as games in which humans and agents play together on indoor procedural environments.
We ran a tournament including 40 human players, in which humans and agents are randomly matched up in games - both as opponents and as teammates.
The FTW agents learn to become much stronger than the strong baseline methods, and exceed the win-rate of the human players. In fact, in a survey among participants they were rated more collaborative than human participants.
Going beyond mere performance evaluation, it is important to understand the emergent complexity in the behaviours and internal representations of these agents.
To understand how agents represent game state, we look at activation patterns of the agents’ neural networks plotted on a plane. Dots in the figure below represent situations during play with close by dots representing similar activation patterns. These dots are coloured according to the high-level CTF game state in which the agent finds itself: In which room is the agent? What is the status of the flags? What teammates and opponents can be seen? We observe clusters of the same colour, indicating that the agent represents similar high-level game states in a similar manner.
The agents are never told anything about the rules of the game, yet learn about fundamental game concepts and effectively develop an intuition for CTF. In fact, we can find particular neurons that code directly for some of the most important game states, such as a neuron that activates when the agent’s flag is taken, or a neuron that activates when an agent’s teammate is holding a flag. The paper provides further analysis covering the agents’ use of memory and visual attention.
Aside from this rich representation, how do the agents actually behave? First, we noticed that the agents had very fast reaction times and were very accurate taggers, which could explain their performance. However, by artificially reducing this accuracy and reaction time we saw that this was only one factor in their success.
Through unsupervised learning we established the prototypical behaviours of agents and humans to discover that agents in fact learn human-like behaviours, such as following teammates and camping in the opponent’s base.
These behaviours emerge in the course of training, through reinforcement learning and population-level evolution, with behaviours - such as teammate following - falling out of favour as agents learn to cooperate in a more complementary manner.
Above: the training progression of a population of FTW agents. Top right shows the 30 agents’ Elo ratings as they train and evolve from each other. Top left shows the genetic tree of these evolution events. The lower graph shows the progression of knowledge, some of the internal rewards, and behaviour probability throughout the training of the agents.
The research community has recently done very impressive work in complex games like StarCraft II and Dota 2, and while this paper focuses on Capture the Flag, the research contributions are general and we are excited to see how others build upon our techniques in different complex environments. In the future, we also want to further improve on our current reinforcement learning and population based training methods. In general, we think this work highlights the potential of multi-agent training to advance the development of artificial intelligence: exploiting the natural curriculum provided by multi-agent training, and forcing the development of robust agents that can even team up with humans.
This work was done by Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil Rabinowitz, Ari Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel.
Visualisations by Adam Cain, Damien Boudot, Doug Fritz, Jaume Sanchez Elias, Paul Lewis, and Rich Green.