AlphaFold: Improved protein structure prediction using potentials from deep learning

Abstract

Protein structure prediction aims to determine the three-dimensional shape of a protein from its amino acid sequence. This problem is of fundamental importance to biology as the structure of a protein largely determines its function but can be hard to determine experimentally. In recent years, considerable progress has been made by leveraging genetic information: analysing the co-variation of homologous sequences can allow one to infer which amino acid residues are in contact, which in turn can aid structure prediction. In this work, we show that we can train a neural network to accurately predict the distances between pairs of residues in a protein which convey more about structure than contact predictions. With this information we construct a potential of mean force that can accurately describe the shape of a protein. We find that the resulting potential can be optimised by a simple gradient descent algorithm, to realise structures without the need for complex sampling procedures. The resulting system, named AlphaFold, has been shown to achieve high accuracy, even for sequences with relatively few homologous sequences. In the most recent Critical Assessment of Protein Structure Prediction (CASP13), a blind assessment of the state of the field of protein structure prediction, AlphaFold created high-accuracy structures (with TM-scores of 0.7 or higher) for 24 out of 43 free modelling domains whereas the next best method, using sampling and contact information, achieved such accuracy for only 14 out of 43 domains. This significant advance in protein structure prediction could contribute to progress in understanding the function and malfunction of proteins throughout biology.

Publications