Researchers evaluated how well ChatGPT performed in diagnosing musculoskeletal conditions compared to radiologists. They looked at 106 imaging cases and found that while ChatGPT’s performance was similar to that of radiology residents, it did not match the accuracy of board-certified radiologists.
In the field of radiology, accurately interpreting diagnostic images necessitates a deep understanding of a wide range of medical conditions. Recently, generative AI models like ChatGPT have emerged as potential tools for diagnostic assistance, but their accuracy still requires thorough assessment for effective future application.
To explore this, Dr. Daisuke Horiuchi and Associate Professor Daiju Ueda from the Graduate School of Medicine at Osaka Metropolitan University headed a research team to compare ChatGPT’s diagnostic capabilities with those of radiologists. They used a dataset of 106 musculoskeletal imaging cases, which included patient history, images, and diagnostic results.
For their analysis, the information from each case was input into both GPT-4 and its vision-enhanced version, GPT-4V, to generate possible diagnoses. In contrast, the same cases were evaluated by a radiology resident and a board-certified radiologist, who were tasked with providing their diagnoses. The results indicated that GPT-4 outperformed GPT-4V and exhibited diagnostic accuracy similar to that of radiology residents. However, ChatGPT’s performance was notably lower compared to the board-certified radiologists.
“The study indicates that while ChatGPT can be beneficial for diagnostic imaging, its accuracy does not rival that of board-certified radiologists. It’s crucial to thoroughly understand its capabilities as a diagnostic tool before considering its application,” remarked Dr. Horiuchi. “Generative AI tools like ChatGPT are evolving continuously, and there are high expectations for them to serve as supportive resources in diagnostic imaging moving forward.”
The results of this research were published in European Radiology.