Nearly all living organisms share a common genetic code, a complex system through which genetic information is transformed into proteins, which are essential for life. Recent research indicates that our current understanding of the evolution of this code may be inaccurate.
Despite an incredible range of diversity, almost every form of life—from bacteria to blue whales—utilizes the same genetic code. The origins and development of this code have sparked much debate among scientists.
Sawsan Wehbi, a doctoral candidate in the Genetics Graduate Interdisciplinary Program at the University of Arizona, has taken a novel perspective on this longstanding issue. Her findings provide compelling evidence that the accepted model for how the universal genetic code developed requires reassessment. Wehbi is the lead author of a study published in the journal PNAS that challenges the widely held views on the sequence of amino acids— the fundamental components of the code.
“The genetic code is a remarkable system where a sequence of DNA or RNA, made up of four nucleotides, is used to create protein sequences composed of 20 different amino acids,” explained Joanna Masel, the study’s senior author and a professor of ecology and evolutionary biology at the U of A. “It’s a bewilderingly complex process, and our code performs surprisingly well. It appears to have evolved through several stages.”
The research indicates that early forms of life favored smaller amino acid molecules over larger, more complex ones, which were incorporated later. Additionally, amino acids that bind with metals were found to have been added to the code much earlier than previously believed. The researchers also concluded that the current genetic code may have developed after several earlier codes that are no longer in use.
The authors contend that the existing comprehension of the code’s evolution is misguided because it is based more on misleading laboratory experiments than on evolutionary evidence. One major element of traditional theories about genetic code evolution is the well-known Urey-Miller experiment from 1952, which aimed to recreate the conditions of early Earth, believed to be crucial for the emergence of life.
Although this experiment successfully demonstrated that life’s foundational building blocks, including amino acids, could form from nonliving matter through simple chemical processes, its conclusions have come under scrutiny. For instance, it failed to produce any amino acids that contain sulfur, even though sulfur was abundant on early Earth. This leads to the assumption that sulfur-containing amino acids entered the code much later, a conclusion that is hardly surprising given that sulfur was not included in the experiment’s setup.
Co-author Dante Lauretta, a Regents Professor of Planetary Science and Cosmochemistry at the U of A Lunar and Planetary Laboratory, stated that early life’s sulfur-rich composition has implications for astrobiology, particularly in understanding the habitability and potential biosignatures of extraterrestrial environments.
“On planets like Mars, Enceladus, and Europa, where sulfur compounds are common, these findings could guide our search for life by emphasizing similar biogeochemical cycles or microbial processes,” he said. “Such insights might enhance our criteria for biosignatures, facilitating the discovery of life forms that thrive in sulfur-rich environments beyond our planet.”
The research team utilized a new analytical approach to examine amino acid sequences across the evolutionary spectrum, tracing back to LUCA, the Last Universal Common Ancestor, presumed to have existed around 4 billion years ago and representing the common ancestor of all current life forms on Earth. Unlike previous studies that investigated entire protein sequences, Wehbi and her colleagues concentrated on protein domains, which are shorter sequences of amino acids.
“Think of a protein like a car and a domain as a wheel,” Wehbi explained. “A domain is a functional part that can be utilized in many different ‘cars’, and wheels have existed for a longer time than the cars themselves.”
To determine when a specific amino acid was likely included in the genetic code, the researchers employed statistical analysis tools to compare the presence of each amino acid in protein sequences dating back to LUCA and beyond. An amino acid that appears prominently in ancient sequences likely emerged early, whereas LUCA’s sequences lack amino acids that were added later, but were accessible when less ancient protein sequences appeared.
The researchers identified over 400 families of sequences tracing back to LUCA, with more than 100 of these existing even earlier and diversifying prior to LUCA. Interestingly, these early sequences contained more amino acids with aromatic ring structures, such as tryptophan and tyrosine, even though these amino acids were late additions to our current code.
“This provides clues about earlier genetic codes that preceded ours and have since vanished into the depths of geological time,” Masel noted. “It appears that early life had a preference for ring structures.”