The University of Cambridge led a study that found the AI model GPT-4 to outperform non-specialist doctors in assessing eye problems and providing advice. The study revealed that GPT-4’s clinical knowledge and reasoning skills are nearly on par with those of specialist eye doctors. The model was tested against doctors at various career levels, including junior doctors without specializations, as well as trainee and expert eye doctors. Each participant was given 87 patient scenarios involving a specific eye problem and were asked to give their assessment.
When it comes to giving a diagnosis or recommending treatment, there are four options to choose from.
In the test, GPT-4 outperformed unspecialised junior doctors, who have the same level of specialist eye knowledge as general practitioners.
GPT-4 achieved similar scores to both trainee and expert eye doctors, although the top performing doctors still scored higher.
According to the researchers, large language models are not likely to replace healthcare professionals, but they have the potential to enhance healthcare as part of the clinical workflow.
The study suggests that AI could be used in a controlled setting, such as triaging patients, to provide eye-related advice, diagnosis, and management recommendations. Dr. Arun Thirunavukarasu, the lead author of the study, believes that AI could help determine which eye cases are emergencies requiring immediate specialist attention, which can be addressed by a general practitioner, and which do not require treatment. This research was conducted while Dr. Thirunavukarasu was a student at the University of Cambridge’s School of Clinical Medicine.
Researchers have found that GPT-4 is just as effective as expert clinicians in interpreting eye symptoms and signs to answer more complex inquiries. This suggests that with further advancements, large language models could potentially assist general practitioners who are struggling to obtain timely advice from eye specialists, especially as people in the UK are experiencing longer wait times for eye care.
The development of these models requires a substantial amount of clinical text for fine-tuning, and efforts are underway globally to facilitate this process. The researchers assert that their study is more advanced than previous ones because it directly compared the capabilities of AI to those of practicing doctors.
Doctors are not constantly studying for exams throughout their careers. The study aimed to compare AI with the real-time knowledge and skills of practicing doctors, to make a fair assessment,” explained Thirunavukarasu, who is currently an Academic Foundation Doctor at Oxford University Hospitals NHS Foundation Trust.
He emphasized the importance of understanding the capabilities and limitations of commercially available models, as patients may already be relying on them for guidance instead of the internet.
The examination covered various eye issues, such asThe symptoms include extreme light sensitivity, decreased vision, lesions, and itchy and painful eyes, which are taken from a textbook used to train eye doctors. This textbook is not freely available on the internet, so it is unlikely that its content was included in GPT-4’s training datasets. The results are published today in the journal PLOS Digital Health. “Even considering the future use of AI, I believe doctors will still be responsible for patient care. The key is to empower patients to decide whether they want computer systems to be involved or not. This will be a personal decision.
“Every patient must make their own decision,” Thirunavukarasu said.
GPT-4 and GPT-3.5, also known as ’Generative Pre-trained Transformers,’ have been trained on massive datasets with hundreds of billions of words from various sources such as articles, books, and the internet. These are just two examples of large language models, with others including Pathways Language Model 2 (PaLM 2) and Large Language Model Meta AI 2 (LLaMA 2) being widely used.
In the study, GPT-3.5, PaLM2, and LLaMA were all tested using the same set of questions. GPT-4 provided more accurate responses than any of them.
GPT-4 is the technology behind the online chatbot ChatGPT, which offers customized responses to human queries. In recent months, the platform has seen a significant increase in usage.ths, ChatGPT has gained significant attention in the medical field for achieving a level of performance in medical school examinations and generating more accurate and compassionate responses to patient inquiries compared to human doctors.
The field of large language models with artificial intelligence is evolving rapidly. Since the research was conducted, more advanced models have been introduced, which may be even more comparable to expert ophthalmologists.