Learn what protein folding is, why it's important and how our AI system AlphaFold is working to solve this grand scientific challenge.
Building blocks of life
Inside every cell in your body, billions of tiny molecular machines are hard at work. They’re what allow your eyes to detect light, your neurons to fire, and the ‘instructions’ in your DNA to be read, which make you the unique person you are.
Currently, there are around 200 million known proteins, with another 30 million found every year. Each one has a unique 3D shape that determines how it works and what it does.
But figuring out the exact structure of a protein remains an expensive and often time-consuming process, meaning we only know the exact 3D structure of a tiny fraction of the proteins known to science.
Finding a way to close this rapidly expanding gap and predict the structure of millions of unknown proteins could not only help us tackle disease and more quickly find new medicines, but perhaps also unlock the mysteries of how life itself works.
The protein folding problem
If you could unravel a protein you would see that it’s like a string of beads made of a sequence of different chemicals known as amino acids.
These sequences are assembled according to the genetic instructions of an organism's DNA.
Attraction and repulsion between the 20 different types of amino acids cause the string to fold in a feat of ‘spontaneous origami’, forming the intricate curls, loops, and pleats of a protein’s 3D structure.
For decades, scientists have been trying to find a method to reliably determine a protein’s structure just from its sequence of amino acids.
This grand scientific challenge is known as the protein folding problem.
What is AlphaFold?
We started working on this challenge in 2016 and have since created an AI system known as AlphaFold.
It was taught by showing it the sequences and structures of around 100,000 known proteins. Today, AlphaFold can make accurate predictions of what protein shapes could look like based on their sequences.
Joining a global research community
In 1994, scientists interested in protein folding formed the Critical Assessment of protein Structure Predictions (CASP).
CASP is a community forum that allows researchers to share progress on the protein folding problem. The community also organises a biennial challenge for research groups to test the accuracy of their predictions against real experimental data.
Teams are given a selection of amino acid sequences for proteins which have had their exact 3D shape mapped but have not yet been released into the public domain. Groups must submit their best predictions to see how close they are to the subsequently revealed structures.
Among the teams that participated in CASP13 (2018), AlphaFold placed first in the protein folding challenge and we continue to benchmark progress through this gold standard assessment and expert community.
Our work builds upon decades of research by CASP’s organisers and the protein folding community, and we’re indebted to the countless number of people who have conducted experiments on protein structures over the years, making such rigorous evaluations possible.
When Covid-19 emerged, very little was known about it. But scientists around the world came together to find ways to tackle it.
SARS-CoV-2, the virus that causes Covid-19, is composed of about 30 kinds of proteins, and about ten of these were poorly understood.
Our research team used AlphaFold to predict the structures of six understudied proteins in the SARS-CoV-2 virus genome, in the hope that they might advance our understanding of the virus.
The structure of one of these proteins, known as ORF3a, was subsequently worked out using scientific experiments and confirmed the accuracy of AlphaFold’s prediction. This offers a glimpse of how tools like Alphafold could better prepare us for a future pandemic.
Accelerating scientific discovery
A system like AlphaFold that is able to accurately predict the structure of proteins could accelerate progress in many areas of research that are important for society.
For example, limited information on protein structures has been a major barrier to increasing our understanding of neglected tropical diseases like sleeping sickness (trypanosomiasis) and leishmaniasis, which impact the lives of millions of people and cause tens of thousands of deaths every year.
It also holds back many fundamental research efforts. For example, it can take over $2bn and more than 10 years to develop a new drug. AlphaFold could help contribute to better and more efficient drug discovery by identifying the structure of many human proteins involved in disease.
It could also help unlock new possibilities such as finding proteins and enzymes that break down industrial and plastic waste or efficiently capture carbon from the atmosphere.
There’s a lot more work to do before we’re able to help have a real impact in these areas and more, but the potential is enormous.
Looking to the future
Our research on AlphaFold continues, but our work so far – and the independent assessments from organisations like CASP – strengthens our hope that its predictions will soon help unlock new possibilities in biological research that will benefit society.
We’re looking forward to the results of CASP14 later this year and to continuing to work with the global scientific community to advance our understanding and unlock the potential of these building blocks of life.