Alphabet (Google’s parent company) subsidiary DeepMind has shown that Machine Learning (ML) can predict the shape of protein machinery with unprecedented accuracy, paving the way for researchers to discover new antibodies, enzymes, and foods.
The shape of a protein provides very strong clues as to how the protein machinery can be used but doesn’t completely solve this question.
“So we asked ourselves: can we predict what function a protein performs?” said Max Bileschi, staff software engineer, Google Research, Brain Team.
In a Nature Biotechnology article, Google described how neural networks can reliably reveal the function of the “dark matter” of the protein universe, outperforming state-of-the-art methods.
DeepMind worked closely with internationally recognized experts at the EMBL’s European Bioinformatics Institute (EMBL-EBI) to annotate 6.8 million more protein regions in the ‘Pfam v34.0 database’ release, a global repository for protein families and their function.
These annotations exceed the expansion of the database over the last decade and will enable the 2.5 million life-science researchers around the world to discover new antibodies, enzymes, foods, and therapeutics.
For about a third of all proteins that all organisms produce, we just don’t know what they do.
“It’s kind of like we’re in a factory where everything’s buzzing, and we’re surrounded by all these impressive tools, but we have only a vague idea of what’s going on. Understanding how these tools operate, and how we can use them, is where we think machine learning can make a big difference,” said Lucy Colwell, senior staff research scientist, Google Research, Brain Team.
The Pfam database is a large collection of protein families and their sequences.
“Our ML models helped annotate 6.8 million more protein regions in the database,” said the researchers.
The company has also launched an interactive scientific article where “you can play with our ML models — getting results in real-time, all in your web browser, with no setup required.”
According to researchers, combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information.
This approach extends the coverage of Pfam by more than 9.5 percent, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation.
“The results suggest that deep learning models will be a core component of future protein annotation tools.”