The Health Benefits of Cranberries: Essential Insights for Your Thanksgiving Feast

Are cranberries good for you? What to know before Thanksgiving. Are you team canned or team fresh cranberry sauce? This Thanksgiving, we're answering plenty of your burning, commonly-searched food questions. Here, we're tackling the nutritional facts behind cranberries. Here's how certain cranberry dishes may or may not boost your nutrition this holiday season. And remember
HomeHealthAddressing Ongoing Challenges in AI-Driven Genomic Research

Addressing Ongoing Challenges in AI-Driven Genomic Research

Researchers are cautioning that the rise of artificial intelligence tools in genetics and medicine may result in incorrect conclusions regarding the relationship between genes and physical traits, including the risk of diseases like diabetes.

Experts from the University of Wisconsin-Madison are cautioning that the increasing use of artificial intelligence tools in genetics and medicine could lead to incorrect conclusions about the relationship between genes and physical traits, such as risk factors for diseases like diabetes.

The incorrect predictions stem from the way researchers are using AI to aid genome-wide association studies. These studies analyze numerous genetic variations across large groups to find relationships between genes and physical traits, particularly those related to certain diseases.

The complexity of genetics and disease

While genetics contribute to many health issues, the connection is not straightforward. Some genetic changes correlate directly with a higher risk of conditions like cystic fibrosis, whereas the relationship between genetics and other traits is often more complex.

Genome-wide association studies have disentangled some of these complexities, frequently utilizing vast databases that combine genetic profiles with health information, like the NIH’s All of Us project and the UK Biobank. However, these databases often lack essential data on the health conditions researchers aim to study.

“Certain traits can be costly or time-consuming to measure, leaving researchers without enough samples to draw significant statistical conclusions about their genetic connections,” says Qiongshi Lu, an associate professor in the UW-Madison Department of Biostatistics and Medical Informatics and a specialist in genome-wide association studies.

The dangers of filling data gaps with AI

To circumvent this challenge, researchers are increasingly using sophisticated AI tools to fill data gaps.

“In recent years, utilizing advances in machine learning has become quite popular, leading to the development of advanced AI models that researchers employ to predict complex traits and disease risks even when data is limited,” Lu explains.

Lu and his colleagues have highlighted the risks of solely relying on these models without addressing the biases they might create. Their findings were published in a paper in the journal Nature Genetics, showing that a prevalent machine learning algorithm used in genome-wide association studies can incorrectly associate various genetic variations with an individual’s likelihood of developing Type 2 diabetes.

“If you mistakenly trust the machine learning-predicted diabetes risk as accurate, you may conclude that all these genetic variations are legitimately related to diabetes, even though they are not,” Lu warns.

Moreover, these “false positives” are not confined to the discussion about diabetes risk but represent a widespread bias in AI-assisted studies, Lu notes.

A new statistical approach to reduce false positives

Along with identifying the issues arising from an overreliance on AI tools, Lu and his colleagues have suggested a new statistical method that researchers can employ to ensure the reliability of their AI-supported genome-wide association studies. This approach aims to eliminate biases that machine learning algorithms might introduce when making conclusions based on partial data.

“This new method is statistically optimal,” Lu states, adding that the team applied it to more accurately identify genetic associations with individuals’ bone mineral density.

AI is not the sole issue in some genome-wide association studies

While the proposed statistical method could enhance the accuracy of AI-based studies, Lu and his team have also found issues with similar studies that use proxy information to fill data gaps instead of relying on algorithms.

In another paper published in Nature Genetics, the researchers raised concerns about studies that excessively depend on proxy data to draw connections between genetics and various diseases.

For example, large health databases such as the UK Biobank possess extensive genetic data on large populations, but they often lack detailed information on conditions that typically manifest later in life, such as numerous neurodegenerative diseases.

For Alzheimer’s disease, certain researchers have sought to address this gap using proxy data collected through family health history surveys, where participants report a parent’s Alzheimer’s diagnosis.

The UW-Madison team discovered that studies relying on such proxy information can produce “highly misleading genetic correlations” linking Alzheimer’s risk to enhanced cognitive abilities.

“While genomic scientists now frequently work with biobank datasets containing hundreds of thousands of individuals, the increase in statistical power also amplifies biases and error probabilities within these comprehensive datasets,” Lu explains. “Our group’s recent research serves as a humbling reminder of the need for statistical rigor in studies conducted at biobank scale.”