Researchers Caution Against Overdependence on AI-Driven Genetic Disease Predictions
The University of Wisconsin-Madison warns that the common use of artificial intelligence (AI) techniques in genetics and medical research could potentially lead to incorrect results when linking genes to physical features or illness risks like diabetes. As artificial intelligence (AI) gains popularity in genome-wide association studies (GWAS), scientists warn that without rigorous control, these techniques can generate "false positives," falsely associating some genetic changes to diseases.
"Some characteristics are either very expensive or labor-intensive to measure, so you simply don't have enough samples to make meaningful statistical conclusions about their association with genetics," says Qiongshi Lu, an associate professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin—Madison. The scarcity of data tempts academics to rely on AI to fill in the gaps, but this can result in unforeseen biases.
Lu and his colleagues demonstrated in a recent Nature Genetics paper that a common machine learning technique in GWAS can incorrectly link genetic variants to an increased risk of type 2 diabetes. "The problem is if you trust the machine learning-predicted diabetes risk as the actual risk, you would think all those genetic variations are correlated with actual diabetes even though they aren't," according to Lu. This misconception extends beyond diabetes to other illnesses, raising questions about the accuracy of AI predictions in genetic research.
To address this issue, Lu and colleagues created a new statistical method that minimizes AI-induced biases in genetic studies. This technique, which has been successful in identifying genetic linkages to bone mineral density, provides a more statistically robust foundation for evaluating genetic risk factors. "This new strategy is statistically optimal," Lu says, emphasizing that it can increase the reliability of AI-assisted studies by accounting for errors caused by inadequate data.
The UW-Madison team also points out another potential flaw in GWAS: the use of proxy data to fill in missing information. Large databases, such as the UK Biobank, have genetic profiles for hundreds of thousands of people, but they frequently lack information on specific disorders, such as Alzheimer's disease, which appears later in life. To compensate, researchers have occasionally employed proxy data, such as family health history, which might produce misleading results. Lu and his colleagues discovered that Alzheimer's risk estimations based on proxy data frequently incorrectly link the disease to higher cognitive ability.
"These days, genomic scientists routinely work with biobank datasets that have hundreds of thousands of individuals," Lu says. However, as statistical power increases, these massive datasets also magnify biases and the probability of errors. The findings highlight the importance of rigorous statistical methods in biobank-scale research to achieve accurate illness predictions.
This cautionary study highlights the difficulty of linking genetics to health outcomes. UW-Madison researchers emphasize that, despite AI's intriguing tools for bridging data gaps, caution is necessary to avoid making incorrect health predictions, especially for diseases like diabetes and Alzheimer's, where genetic linkages are already complex. For the time being, their study emphasizes statistical rigor, pushing for strategies to decrease AI-induced biases and improve the trustworthiness of GWAS results.
Meta Keywords: Artificial intelligence in genetics, genome-wide association studies, genetic disease risk, Type 2 diabetes, AI false positives, UW-Madison study, genetic variants, medical AI dangers, statistical rigour in genetics.
UW-Madison researchers caution that AI methods employed in genetic studies may result in inaccurate illness predictions. Their findings highlight the hazards of "false positives" in AI-assisted genome investigations and suggest a new statistical strategy to increase accuracy.