The AI tool AlphaFold has been enhanced to accurately predict the shapes of large and complex protein structures. Researchers from Linköping University have also successfully incorporated experimental data into the tool. The findings, reported in Nature Communications, mark progress towards more effective creation of new proteins for applications, including medical drugs.
Proteins play a crucial role in every living organism, overseeing numerous cellular functions. Essentially, proteins are involved in everything from muscle control and hair formation to oxygen transport in the blood and food digestion. They are also prevalent outside the body, found in products like detergents and pharmaceutical drugs.
These large molecules are made up of 20 different amino acids linked together in long sequences, resembling beads on a string. These sequences can range from 50 to several thousand amino acids, leading to billions of possible combinations that dictate each protein’s three-dimensional structure. The specific shape of a protein’s chain determines its unique functions.
For over five decades, scientists have endeavored to predict and design various protein structures to better comprehend bodily functions, diseases, and to innovate new medical treatments. This process has traditionally been painstaking and costly, requiring extensive manual effort.
In 2020, however, the company DeepMind introduced open-source software called AlphaFold. Utilizing artificial intelligence and neural networks, AlphaFold can accurately predict how proteins will fold and function. This innovation was significant enough to earn the Nobel Prize in Chemistry in 2024.
Despite its successes, AlphaFold had limitations, particularly with predicting very large protein complexes and interpreting data from experiments or incomplete data sets.
Researchers at Linköping University have made strides in addressing these challenges by advancing AlphaFold into a new version they call AF_unmasked. This iteration can process experimental information and incomplete data while also predicting the shapes of large, intricate protein structures.
“We’re providing new types of input for AlphaFold. Our aim is to combine insights from experimental work and neural networks to construct larger protein structures. Additionally, if you have a preliminary version of a structure, you can input that into AlphaFold for a fairly precise prediction,” explains Claudio Mirabello, an associate professor at Linköping University’s Department of Physics, Chemistry, and Biology.
The purpose of AF_unmasked is for researchers to refine their experiments and receive suggestions on protein design, which will enhance understanding of protein functions and lead to the creation of new protein-based drugs.
The breakthrough with AlphaFold stems from decades of data collection by researchers worldwide since the 1970s about the structures of around 200,000 different proteins stored in a comprehensive database. This extensive data served as the foundation for training AlphaFold. The ability to operate on a large scale was made possible thanks to advancements in supercomputing technology, particularly the use of GPUs for intensive calculations.
Björn Wallner, a bioinformatics professor at Linköping University, has collaborated with one of the Nobel Prize winners.
“The potential for protein design is limitless—only one’s imagination can hold it back. We can create proteins for various applications, both within and external to the body. Continually, it’s essential to tackle more complex challenges as we solve existing ones. Fortunately, discovering new problems is not an issue in our field,” shares Björn Wallner.
He, along with Claudio Mirabello, played a significant role in creating an earlier version of AlphaFold, which inspired DeepMind during development. With the support of the resources from the Google-affiliated company, they were able to advance what is now an essential tool for protein scientists globally.
“Although AlphaFold was not the first tool to utilize deep neural networks for this challenge, one of its distinctive features is that it encodes a protein’s evolutionary history within the neural network—a concept that originated here at LiU and was published by Björn and me in 2019. Therefore, you could argue that AlphaFold was built upon our concept, and now we are building upon AlphaFold,” concludes Claudio Mirabello.