The three-dimensional arrangement of proteins could reveal ancient evolutionary links within the tree of life, according to a recent study. This research represents a significant advancement as it is the first instance of scientists merging protein shape data with genomic sequence information to improve the accuracy of evolutionary trees. Understanding these trees is vital for the scientific community to study the history of life, monitor the spread of pathogens, and innovate new medical treatments. Notably, this approach is applicable even to predicted protein structures that have not yet been experimentally confirmed. This research opens doors for utilizing the extensive structural data produced by technologies such as AlphaFold 2, which could provide new perspectives on life’s ancient history on Earth.
A study published in Nature Communications indicates that the three-dimensional shapes of proteins can shed light on profound evolutionary connections in the tree of life.
This pioneering study combines data on protein shapes with genomic sequences, thus boosting the reliability of evolutionary trees—essential instruments for scientists to decipher the history of life, follow disease transmission, and create novel therapies.
Importantly, this new method can also incorporate predicted protein structures that have not been experimentally tested. This has substantial implications for utilizing the extensive structural data generated by tools such as AlphaFold 2, providing deeper insights into the ancient biological heritage of our planet.
Currently, researchers have around 210,000 experimentally validated protein structures, with approximately 250 million known protein sequences. Initiatives like the EarthBioGenome project are expected to generate billions of additional protein sequences, paving the way for unprecedented research opportunities.
For many years, biologists have been reconstructing evolutionary history by tracing the divergence of species and genes from their common ancestors. Typically, these phylogenetic trees are created by comparing DNA or protein sequences to assess their similarities and differences, thereby inferring relationships.
However, researchers face a substantial challenge known as saturation. Over long periods, genomic sequences can alter so significantly that they bear little resemblance to their ancestral forms, making it hard to detect signs of common ancestry.
“Saturation is a major challenge in phylogenetics and serves as a key obstacle to reconstructing ancient relationships,” explains Dr. Cedric Notredame, a researcher at the Centre for Genomic Regulation (CRG) and lead author of the study. “It’s comparable to the degradation of an ancient manuscript—eventually, the letters fade and the original message is lost.”
To address this issue, the research team focused on analyzing the physical forms of proteins. The folding patterns of proteins create complex shapes that determine their cellular functions. These shapes tend to be more stable through evolutionary changes compared to protein sequences, making them more resilient and better at preserving ancestral features.
The three-dimensional structure of a protein is dictated by its amino acid sequence. Although mutations can occur within these sequences, the overall shape generally remains consistent to preserve functionality. The researchers proposed that by measuring the distances between pairs of amino acids within a protein, called intra-molecular distances (IMDs), they could track how structures evolve over time.
The study compiled a broad dataset of proteins with known structures from various species and calculated the IMDs for each protein, which were then used to construct phylogenetic trees.
The findings indicated that trees built on structural data corresponded closely with those created from genetic sequences, but with a vital advantage: the structural trees were less vulnerable to saturation, meaning they retained reliable signals even when genetic sequences had diverged significantly.
Recognizing that both sequences and structures offer valuable information, the team created a combined approach that not only improved the credibility of the tree branches but also helped differentiate between valid and invalid relationships.
“This is akin to having two eyewitnesses recount an incident from different angles,” states Dr. Leila Mansouri, a coauthor of the study. “Each provides unique details, but together they narrate a more complete and accurate account.”
A practical example of how this integrated method could significantly impact research is in understanding the relationships among kinases in the human genome. Kinases are proteins that play key roles in a variety of cellular processes.
“The genomes of most mammals, including humans, have about 500 protein kinases that regulate nearly all biological functions,” notes Dr. Notredame. “These kinases are primary targets in cancer treatments, like the drug imatinib for humans or toceranib for dogs.”
Human kinases emerged through duplications that occurred over the past billion years. “In the human genome, kinases that are the farthest apart genetically date back about a billion years,” adds Dr. Notredame. “They duplicated at the common ancestor of our most ancient predecessors.”
This extensive timeline creates challenges in accurately crafting gene trees that illustrate the connections between these kinases. “Nevertheless, despite its flaws, the kinase evolutionary tree is widely used to understand its interactions with different drugs. Enhancing this tree, or refining those of other important protein families, would be a significant advancement for human health,” concludes Dr. Notredame.
The potential ramifications of this research go well beyond cancer. Applying this approach to create more accurate evolutionary trees could also deepen our understanding of disease evolution, benefitting vaccine and therapy development. Additionally, it may provide insights into the origins of complex traits, assist in discovering new enzymes for biotechnological applications, and even help track species dissemination in relation to climate change.