Researchers have explored the effectiveness of GPT-4—an advanced artificial intelligence (AI) language model—in helping doctors diagnose patients.
According to a study published in JAMA Network Open, a team from the University of Minnesota Medical School, Stanford University, Beth Israel Deaconess Medical Center, and the University of Virginia investigated how well doctors utilized GPT-4, a large AI language model, in patient diagnosis.
The research involved 50 physicians licensed in the U.S., working in family medicine, internal medicine, and emergency medicine. The team discovered that the use of GPT-4 as a diagnostic tool did not notably enhance clinical reasoning compared to traditional methods. Other significant findings included:
- GPT-4 on its own achieved considerably better diagnostic scores, outpacing both healthcare professionals using standard online diagnostic resources and those who were aided by GPT-4.
- The introduction of GPT-4 did not lead to a significant improvement in diagnostic performance when comparing clinicians assisted by GPT-4 with those utilizing conventional diagnostic tools.
“The AI domain is growing swiftly and influencing our lives both in and out of healthcare. It is crucial for us to examine these tools and determine how to leverage them for better patient care and enhanced clinical experiences,” stated Andrew Olson, MD, a professor at the University of Minnesota Medical School and a hospitalist with M Health Fairview. “This research indicates that there is potential for further development in the collaboration between physicians and AI in clinical settings.”
The findings highlight the challenges of incorporating AI into clinical workflows. While GPT-4 showed encouraging results independently, its role as a diagnostic support tool alongside physicians did not provide significant advantages over traditional diagnostic resources. This points to the complex potential of AI in healthcare, reinforcing the need for additional research into how these technologies can optimally assist in clinical practice. Furthermore, there is a requirement for more studies to determine how clinicians might be effectively trained to utilize these AI tools.
The four partnered institutions have established a cross-country AI evaluation network, known as ARiSE, to further assess the outputs of Generative AI in healthcare.
Funding for this research was supported by the Gordon and Betty Moore Foundation.