Innovative Research Paves the Path to Enhanced Brain Study Reliability

A new study identifies research strategies for tying brain function and structure to behavior and health. Brain-wide association studies, which use magnetic resonance imaging to identify relationships between brain structure or function and human behavior or health, have faced criticism for producing results that often cannot be replicated by other researchers. A new study published
HomeHealthMisguided AI Recommendations Impact Diagnostic Choices, Research Reveals

Misguided AI Recommendations Impact Diagnostic Choices, Research Reveals

When it comes to making diagnostic decisions, a recent study suggests that radiologists and other medical professionals might depend excessively on artificial intelligence (AI), especially when it highlights an area of interest in an X-ray.

A recent study published in Radiology, which is associated with the Radiological Society of North America (RSNA), indicates that radiologists and other physicians might over-rely on artificial intelligence (AI) when it identifies specific areas of interest in X-ray images.

“As of 2022, there were 190 AI programs for radiology approved by the U.S. Food and Drug Administration,” said Paul H. Yi, M.D., one of the study’s senior authors and director of intelligent imaging informatics at St. Jude Children’s Research Hospital in Memphis, Tennessee. “Despite this, a disconnect has emerged between AI’s proof of concept and its actual clinical application. Building appropriate trust in AI recommendations is crucial to bridging this gap.”

The prospective study was conducted across multiple sites and included 220 physicians, comprising 132 radiologists and specialists in internal and emergency medicine. They analyzed chest X-rays with the assistance of AI, which had diagnostic performance comparable to field experts. Each physician evaluated eight chest X-ray cases with guidance from a simulated AI assistant. The clinical scenarios included frontal and, when available, lateral chest X-ray images sourced from Beth Israel Deaconess Hospital in Boston through the open-source MIMI Chest X-Ray Database. A group of radiologists chose cases designed to reflect real-world clinical situations.

For every case, participants received the patient’s clinical details, the AI’s recommendations, and the X-ray images. The AI could offer either a correct or incorrect diagnosis, backed up by local or global explanations. Local explanations specifically highlight areas within the image deemed critical, while global explanations provide similar prior case images to clarify how the AI reached its conclusion.

“These local explanations guide physicians directly to the critical areas in real time,” Dr. Yi explained. “In our study, AI practically framed areas affected by pneumonia or other issues.”

The reviewers had the option to accept, adjust, or reject the AI’s suggestions and were also asked to share their confidence in the findings and rank the AI’s advice for usefulness.

With the help of mixed-effects models, study co-first authors Drew Prinster, M.S., and Amama Mahmood, M.S., both Ph.D. candidates in computer science at Johns Hopkins University in Baltimore, guided the analysis of how various experimental variables influenced diagnostic accuracy, efficiency, perceptions of AI’s usefulness, and “simple trust” (the speed at which users agreed or disagreed with AI guidance). The analysis considered factors like user demographics and professional experience.

The findings revealed that reviewers were more inclined to align their diagnostic choices with AI recommendations and spent less time considering these decisions when local explanations were provided by AI.

“Local explanations led to improved diagnostic accuracy for physicians when the AI’s suggestions were correct,” Dr. Yi noted. “They also enhanced overall diagnostic efficiency by decreasing the time spent deliberating over AI advice.”

When AI was correct, the average diagnostic accuracy for reviewers stood at 92.8% with local explanations and 85.3% with global ones. Conversely, when the AI provided incorrect guidance, physicians achieved an accuracy of 23.6% with local and 26.1% with global explanations.

“When presented with local explanations, both radiologists and non-radiologists in the study tended to trust the AI’s diagnosis more quickly, irrespective of the AI’s accuracy,” Dr. Yi remarked.

Chien-Ming Huang, Ph.D., a co-senior author and an Assistant Professor in Computer Science at Johns Hopkins University, emphasized that this trust in AI could pose risks, leading to over-reliance or automation bias.

“If we place too much faith in what the computer suggests, it becomes problematic, as AI isn’t always correct,” Dr. Yi cautioned. “Radiologists must remain aware of these risks and stay vigilant about their diagnostic practices and training.”

According to Dr. Yi’s insights from the study, developers of AI systems need to thoughtfully assess how different types of AI explanations may influence trust in AI recommendations.

“I believe that collaboration between industry and healthcare researchers is essential,” he concluded. “I hope this study fosters discussions and leads to productive future research partnerships.”