Elevate your technology and enterprise data strategy to Transform 2021.
The recipe for proteins – large molecules made up of amino acids that are the basic building blocks of tissue, muscle, hair, enzymes, antibodies, and other essential parts of living organisms – is encoded in DNA. It is these genetic definitions that delimit their three-dimensional structures, which in turn determine their capabilities. But protein “folding”, as it’s called, is notoriously difficult to understand from a single corresponding genetic sequence. DNA only contains information about the chains of amino acid residues and not about the final shape of those chains.
In December 2018, DeepMind tried to attack the challenge of protein folding with a machine learning system called AlphaFold. The result of two years of work, the Alphabet subsidiary said at the time that AlphaFold could predict structures more precisely than previous solutions. Giving credit to this claim, the system beat 98 competitors in the Structure Prediction Critical Appraisal Protein Folding Competition (CASP) in Cancun, where it successfully predicted the structure of 25 of the 43 proteins. .
DeepMind now claims that AlphaFold has for the second time surpassed competing methods of predicting protein folding. In the results of the 14th CASP assessment, a more recent version of AlphaFold – AlphaFold 2 – has an average error comparable to the width of an atom (or 0.1 nanometers), competitive with the results of experimental methods.
“We’ve been stuck on this one problem – how proteins fold – for almost 50 years,” University of Maryland professor John Moult, CASP co-founder and president, told reporters in a briefing. last week. “Seeing DeepMind fix this, having personally worked on this problem for so long and after so many stops and starts, wondering if we would ever get there, is a very special moment. “
Solutions to many global challenges, such as developing treatments for disease, can ultimately be attributed to proteins. Antibody proteins are shaped like a “Y”, for example, allowing them to cling to viruses and bacteria, and collagen proteins are shaped like cords, which transmit tension between cartilage, bones, skin and ligaments. In SARS-CoV-2, the novel coronavirus, a spike-shaped protein changes shape to interact with another protein on the surface of human cells, allowing it to force entry.
It was the biochemist Christian Anfinsen who hypothesized in 1972 that the amino acid sequence of a protein could determine its structure. This laid the groundwork for attempts to predict a protein’s structure based on its amino acid sequence as an alternative to expensive and time-consuming experimental methods such as nuclear magnetic resonance, x-ray crystallography, and electron cryomicroscopy. . However, the sheer complexity of protein folding complicates matters. Scientists estimate that due to the countless number of interactions between amino acids, it would take more than 13.8 billion years to understand all the possible configurations of a typical protein before identifying the right structure.
DeepMind says its approach with AlphaFold draws inspiration from the fields of biology, physics, machine learning, and the work of scientists over the past half-century. Taking advantage of the fact that a folded protein can be thought of as a “spatial graph”, where amino acid residues (amino acids contained in a peptide or protein) are nodes and edges connect the residues in close proximity, AlphaFold exploits an AI algorithm that attempts to interpret the structure of this graph while reasoning about the implicit graph that it constructs using evolutionary-related sequences, multiple-sequence alignment, and a representation of pairs of residuals of ‘amino acids.
By iterating through this process, AlphaFold can learn to predict a protein’s underlying structure and determine its shape within days, according to DeepMind. Additionally, the system can self-assess which parts of each protein structure are reliable using an internal confidence measure.
DeepMind says the latest version of AlphaFold, which will be detailed in a future article, was formed on approximately 170,000 protein structures from the Protein Data Bank, an open source database for structural data of large biological molecules. The company tapped 128 of Google’s third-generation tensor processing units (TPUs), special-purpose AI accelerator chips available through Google Cloud, for compute resources roughly equivalent to 100 to 200 graphics cards. . The training lasted a few weeks. For comparison, it took DeepMind 44 days to train a single agent within its StarCraft 2-playing. AlphaStar system using 32 third generation TPUs.
DeepMind declined to disclose the cost of AlphaFold training. But Google charges Google Cloud customers $ 32 per hour per third-generation TPU, which works out to about $ 688,128 per week.
In 1994, Moult and the University of California Professor Davis Krzysztof Fidelis founded CASP as a Biennial Blind Review to catalyze research, track progress, and establish state of the art in prediction. of protein structure. It is considered the gold standard for benchmarking predictive techniques, as CASP chooses structures that have only recently been experimentally selected as targets on which the teams will test their prediction methods. Some were still pending validation at the time of AlphaFold’s evaluation.
Since the target structures are not published in advance, CASP participants must blindly predict the structure of each protein. These predictions are then compared to experimental ground truth data when that data becomes available.
The main metric used by CASP to measure the accuracy of predictions is the overall distance test, which ranges from 0 to 100. This is basically the percentage of amino acid residues at a certain threshold distance from the correct position. A score of around 90 is informally considered to be competitive with results obtained from experimental methods; AlphaFold achieved a median score of 92.4 overall and a median score of 87 for proteins in the free modeling category (i.e., those without models).
“What we saw in CASP14 was a group offering atomic precision right off the bat,” Moult said. “This [progress] gives you such excitement about how science works – about how you can never see exactly, or even roughly, what’s going to happen next. There are always these surprises. And that’s really what keeps you going as a scientist. What will be the next surprise? “
Real world applications
DeepMind argues that AlphaFold, if further refined, could be applied to previously intractable problems in the field of protein folding, including those related to epidemiological efforts. Earlier this year, the company predicted several protein structures of SARS-CoV-2, including ORF3a, whose composition was once a mystery. At CASP14, DeepMind predicted the structure of another coronavirus protein, ORF8, which has since been confirmed by experimenters.
Beyond responding to the pandemic, DeepMind expects AlphaFold to be used to explore the hundreds of millions of proteins for which science currently lacks models. Since DNA specifies the amino acid sequences that make up protein structures, advances in genomics have made it possible to read protein sequences from the natural world, with 180 million protein sequences and counting in the Universal database. Protein available to the public. On the other hand, given the experimental work required to translate from sequence to structure, only about 170,000 protein structures are found in the protein database.
DeepMind is committed to making AlphaFold available “at scale” and to working with partners to explore new frontiers, such as how several proteins form complexes and interact with DNA, RNA and small molecules. Improving the scientific community’s understanding of protein folding could lead to more effective diagnoses and treatment of diseases such as Parkinson’s disease and Alzheimer’s disease, as they are believed to be caused by misfolded proteins. . And it could aid in the design of proteins, leading to bacteria secreting proteins that make wastewater biodegradable, for example, and enzymes that can help manage pollutants such as plastic and petroleum.
In any case, this is an important step for DeepMind, whose work has mainly focused on the field of games. His AlphaStar system beat professional players at StarCraft 2, after the victories of AlphaZero at Go, chess and shogi. While some of DeepMind’s work has found application in the real world, mainly in data centers, Autonomous cars from Waymo, and the Google Play Store recommendation algorithms, DeepMind has yet to achieve a significant AI breakthrough in a scientific field such as protein folding or modeling of glass dynamics. These new results could mark a change in the fortunes of the company.
“AlphaFold represents a huge leap forward that I hope will really accelerate drug discovery and help us better understand the disease. It’s pretty mind-blowing, ”DeepMind CEO Demis Hassabis said during last week’s briefing. “We have advanced the state of the art in the field, so it’s fantastic, but there’s still a long way to go before we resolve it. “
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more