Researchers have developed an AI system that can forecast potential active ingredients with unique properties, creating a chemical language model – similar to ChatGPT, but for molecules. After undergoing a training process, the AI successfully replicated the chemical structures of compounds known for their dual-target activity, which could be effective in medication.
Scientists at the University of Bonn have trained an AI system to anticipate potential active ingredients with distinctive characteristics. To achieve this, they created a chemical language model, akin to a ChatGPT designed for molecules. After a training period, the AI was able to accurately recreate the chemical structures of compounds that possess known dual-target activities, which may lead to effective treatments. The results of this research have been published in Cell Reports Physical Science.
Today, anyone wishing to surprise their grandmother with a poem for her 90th birthday doesn’t need to be a talented poet: a simple prompt in ChatGPT can quickly generate a list of words that rhyme with her name. It can even craft a sonnet if desired.
Researchers at the University of Bonn have applied a similar principle in their study, using what is referred to as a chemical language model. Unlike generating rhymes, this AI showcases the structural formulas of chemical compounds that may exhibit a particularly appealing trait: their ability to bind to two distinct target proteins simultaneously. This dual binding can lead to the inhibition of two enzymes within an organism at once.
Seeking Active Ingredients with Dual Effects
“In pharmaceutical research, compounds with such dual action are extremely sought after due to their polypharmacology,” states Prof. Dr. Jürgen Bajorath, an expert in computational chemistry leading the AI in Life Sciences initiative at the Lamarr Institute for Machine Learning and Artificial Intelligence, as well as the Life Science Informatics program at b-it (Bonn-Aachen International Center for Information Technology) at Uni Bonn. “These compounds have the ability to influence several intracellular processes and signaling pathways concurrently, making them potentially more effective, especially in combating cancer.” While this dual effect can also be achieved by administering multiple drugs simultaneously, it carries risks of unwanted interactions and varying breakdown rates in the body, complicating their coordinated use.
Identifying a molecule that specifically targets a single protein can be quite challenging. Creating compounds that have a predefined dual action is even harder. However, chemical language models may provide assistance in this area moving forward. Just as ChatGPT learns from billions of text pages to formulate coherent sentences, chemical language models also learn from data, though they rely on comparatively limited information. They are fed textual data such as SMILES strings, which represent organic molecules and their structural details using a sequence of letters and symbols. “We have trained our chemical language model using pairs of such strings,” explains Sanjana Srinivasan from Bajorath’s research group. “One string described a molecule known to act against a single target protein, while the other represented a compound that influences both this protein and an additional target.”
AI Understands Chemical Relationships
The model ingested over 70,000 of these pairs, equipping it with an implicit understanding of the differences between standard active compounds and those that have dual effects. “When we inputted a compound targeting a specific protein, it proposed other molecules that would not only act on this protein but also on a second one,” Bajorath elaborates.
The training compounds that displayed dual action typically target proteins that share similarities and function in comparable ways within the body. However, researchers are also interested in finding substances that can act on completely different types of enzymes or receptors. To prepare the AI for this complex task, the team conducted a fine-tuning phase after the initial training. They utilized several specialized training pairs designed to teach the algorithm which specific classes of proteins the suggested compounds should focus on. This process is somewhat akin to guiding ChatGPT to compose a limerick instead of a sonnet.
Post fine-tuning, the model indeed generated molecules that had previously been demonstrated to act against the intended combinations of target proteins. “This confirms that the method is effective,” remarks Bajorath. However, he believes the true value of this approach lies not in immediately discovering new compounds that surpass the effectiveness of existing drugs. “What is particularly intriguing is that the AI often proposes chemical structures that are not immediately obvious to most chemists,” he explains. “In a way, it inspires ‘out of the box’ thinking, generating innovative solutions that can lead to new design hypotheses and research directions.”