Today, I’m incredibly proud and excited to announce that DeepMind is making a significant contribution to humanity’s understanding of biology.
When we announced AlphaFold 2 last December, it was hailed as a solution to the 50-year old protein folding problem. Last week, we published the scientific paper and source code explaining how we created this highly innovative system, and today we’re sharing high-quality predictions for the shape of every single protein in the human body, as well as for the proteins of 20 additional organisms that scientists rely on for their research.
As researchers seek cures for diseases and pursue solutions to other big problems facing humankind – including antibiotic resistance, microplastic pollution, and climate change – they will benefit from fresh insights into the structure of proteins. Proteins are like tiny exquisite biological machines. The same way that the structure of a machine tells you what it does, so the structure of a protein helps us understand its function. Today, we are sharing a trove of information that doubles humanity’s understanding of the human proteome, and reveals the protein structures found in 20 other biologically-significant organisms, from E.coli to yeast, and from the fruit fly to the mouse.
This will be one of the most important datasets since the mapping of the Human Genome.
As a powerful tool that supports the efforts of researchers, we believe this is the most significant contribution AI has made to advancing scientific knowledge to date, and is a great example of the benefits AI can bring to humanity. These insights will underpin many exciting future advances in our understanding of biology and medicine. Thanks to five tireless years of work and a lot of ingenuity from the AlphaFold team, and working closely for the past few months with our partners at EMBL’s European Bioinformatics Institute (EMBL-EBI), we are able to share this huge and valuable resource with the world.
This latest work builds on announcements we made last December, at the CASP14 conference, when DeepMind unveiled a radical new version of our AlphaFold system, which was recognised by the organisers of the assessment as a solution to the 50-year old grand challenge to understand the 3D structure of proteins. Determining protein structures experimentally is a time-consuming and painstaking pursuit, but AlphaFold demonstrated that AI could accurately predict the shape of a protein, at scale and in minutes, down to atomic accuracy. At CASP, we pledged to share our methods and provide broad access to this body of knowledge.
This month, we’ve finished the enormous amount of hard work to deliver on that commitment. We published two peer-reviewed papers in Nature (1,2) and open-sourced AlphaFold’s code. Today, in partnership with EMBL-EBI, we’re incredibly proud to be launching the AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome to date, more than doubling humanity’s accumulated knowledge of high-accuracy human protein structures.
AlphaFold Protein Structure Database
In addition to the human proteome (all the ~20,000 proteins expressed by the human genome), we’re providing open access to the proteomes of 20 other biologically-significant organisms, totalling over 350,000 protein structures. Research into these organisms has been the subject of countless research papers and numerous major breakthroughs, and has resulted in a deeper understanding of life itself. In the coming months we plan to vastly expand the coverage to almost every sequenced protein known to science - over 100 million structures covering most of the UniProt reference database. It’s a veritable protein almanac of the world. And the system and database will periodically be updated as we continue to invest in future improvements to AlphaFold.
Most excitingly, in the hands of scientists around the world, this new protein almanac will enable and accelerate research that will advance our understanding of these building blocks of life. Already, through our early collaborations, we’ve seen promising signals from researchers using AlphaFold in their own work. For instance, the Drugs for Neglected Diseases Initiative (DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world, and the Centre for Enzyme Innovation at the University of Portsmouth (CEI) is using AlphaFold to help engineer faster enzymes for recycling some of our most polluting single-use plastics. For those scientists who rely on experimental protein structure determination, AlphaFold's predictions have helped accelerate their research. As another example, a team at the University of Colorado Boulder is finding promise in using AlphaFold predictions to study antibiotic resistance, while a group at the University of California San Francisco has used them to increase their understanding of SARS-CoV-2 biology. And this is just the start of what we hope will be a revolution in structural bioinformatics. With AlphaFold out in the world, there is a treasure trove of data now waiting to be transformed into future advances.
AlphaFold opens new research horizons, and it is inspiring to see powerful cutting-edge AI enabling work on diseases which are concentrated almost exclusively in impoverished populations.
For the AlphaFold team at DeepMind, this work represents the culmination of five years of enormous effort, including having to creatively overcome many challenging setbacks, resulting in a host of new sophisticated algorithmic innovations that were all needed to finally crack the problem. It builds on the discoveries of generations of scientists, from the early pioneers of protein imaging and crystallography, to the thousands of prediction specialists and structural biologists who’ve spent years experimenting with proteins since. Our dream is that AlphaFold, by providing this foundational understanding, will aid countless more scientists in their work and open up completely new avenues of scientific discovery.
What took us months and years to do, AlphaFold was able to do in a weekend.
At DeepMind, our thesis has always been that artificial intelligence can dramatically accelerate breakthroughs in many fields of science, and in turn advance humanity. We built AlphaFold and the AlphaFold Protein Structure Database to support and elevate the efforts of scientists around the world in the important work they do. We believe AI has the potential to revolutionise how science is done in the 21st century, and we eagerly await the discoveries that AlphaFold might help the scientific community to unlock next.
To learn more, head over to Nature to read our peer-reviewed papers describing our full method, and the human proteome. You can read more about them in our Authors' Notes. If you want to explore our system, here’s the open-source code to AlphaFold and Colab notebook to run individual sequences. Lastly, to explore our structures, EMBL-EBI, the world leader in biological data, is hosting them in a searchable database that is open and free to all.