The story of AlphaGo so far
AlphaGo is the first computer program to defeat a professional human Go player, the first program to defeat a Go world champion, and arguably the strongest Go player in history.
AlphaGo’s first formal match was against the reigning 3-times European Champion, Mr Fan Hui, in October 2015. Its 5-0 win was the first ever against a Go professional, and the results were published in full technical detail in the international journal, Nature. AlphaGo then went on to compete against legendary player Mr Lee Sedol, winner of 18 world titles and widely considered to be the greatest player of the past decade.
AlphaGo's 4-1 victory in Seoul, South Korea, in March 2016 was watched by over 200 million people worldwide. It was a landmark achievement that experts agreed was a decade ahead of its time, and earned AlphaGo a 9 dan professional ranking (the highest certification) - the first time a computer Go player had ever received the accolade.
During the games, AlphaGo played a handful of highly inventive winning moves, several of which - including move 37 in game two - were so surprising they overturned hundreds of years of received wisdom, and have since been examined extensively by players of all levels. In the course of winning, AlphaGo somehow taught the world completely new knowledge about perhaps the most studied and contemplated game in history.
Since then, AlphaGo has continued to surprise and amaze. In January 2017, an improved AlphaGo version was revealed as the online player "Master" which achieved 60 straight wins in online fast time-control games against top international Go players.
In May 2017, Alpha Go took part in The Future of Go Summit in the birthplace of Go, China, to delve deeper into the mysteries of Go in a spirit of mutual collaboration with the country’s top players. You can read more about the five day summit here.
Five months later, we received another Nature paper for AlphaGo Zero. Unlike the earlier versions of AlphaGo which learnt how to play the game using thousands of human amateur and professional games, AlphaGo Zero learnt to play the game of Go simply by playing games against itself, starting from completely random play.
In doing so, it surpassed the performance of all previous versions, including those which beat the World Go Champions Lee Sedol and Ke Jie, becoming arguably the strongest Go player of all time.
We believe this new breakthrough has the potential to facilitate major scientific breakthrough and in doing so drastically change the world for the better.
What is Go?
The game of Go originated in China 3,000 years ago. The rules of the game are simple: players take turns to place black or white stones on a board, trying to capture the opponent's stones or surround empty space to make points of territory. As simple as the rules are, Go is a game of profound complexity. There are an astonishing 10 to the power of 170 possible board configurations - more than the number of atoms in the known universe - making Go a googol times more complex than Chess.
Go is played primarily through intuition and feel, and because of its beauty, subtlety and intellectual depth it has captured the human imagination for centuries.
Interested in discovering the game of Go for yourself, but not sure where to start? Head over to this interactive online training game!
Of if you are looking for new and creative ways of playing Go, check out AlphaGo Teach, launched in December 2017. The tool provides analysis of thousands of the most popular opening sequences from the recent history of Go, demonstrating how AlphaGo analyses different moves and judges whether they are likely to lead to a win.
Mastering the game of Go
The complexity of Go means it has long been viewed as the most challenging of classical games for artificial intelligence. Despite decades of work, the strongest computer Go programs were only able to play at the level of human amateurs.
Traditional AI methods, which construct a search tree over all possible positions, don’t have a chance in Go. This is because of the sheer number of possible moves and the difficulty of evaluating the strength of each possible board position.
In order to capture the intuitive aspect of the game, we knew that we would need to take a novel approach. AlphaGo therefore combines an advanced tree search with deep neural networks. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. One neural network, the “policy network”, selects the next move to play. The other neural network, the “value network”, predicts the winner of the game.
We showed AlphaGo a large number of strong amateur games to help it develop its own understanding of what reasonable human play looks like. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes and incrementally improving until it became immensely strong, through a process known as reinforcement learning.
Our Nature paper, published on 28th January 2016, describes the technical details behind this original approach in greater detail.
Read more about how AlphaGo uses machine learning to master the game of Go in our blog post.
AlphaGo Zero: starting from scratch
In October 2017, our AlphaGo Zero paper was published in the journal Nature. Unlike the earlier versions of AlphaGo which trained on thousands of human amateur and professional games to learn how to play the game. AlphaGo Zero bypasses this process and learns to play the game of Go without human data, simply by playing games against itself.
Experts described the paper as “a significant step towards pure reinforcement learning in complex domains”. We made this progress by streamlining the architecture behind Zero; we unite the policy and value networks into a single neural network and incorporate a simpler tree search that relies on this single neural network to evaluate positions and sample moves, without performing rollouts of the games. This can be thought of as using a single top level professional to advise the system on its next move, rather than taking a crowdsourced answer from hundreds of amateur players. The simplicity of AlphaGo Zero’s architecture also dramatically speeds up the system while also lowering the amount of compute power it needs.
We believe this approach may be generalisable to a wide set of structured problems that share similar properties to a game like Go, such as planning tasks or problems where a series of actions have to be taken in the correct sequence. Examples could include protein folding, reducing energy consumption or searching for revolutionary new materials.
We're looking for exceptional people.