Jeep Wrangler Bids Farewell to Manual Windows, Signaling the End of an Automotive Tradition

Jeep Wrangler ditches manual windows, marking the end of an era for automakers Compared to the original Jeep — you know, the military vehicle — the 2025 Wrangler JL is a spaceship, even though by modern standards it's a very old-school vehicle when compared to, say, the Ford Bronco or Toyota 4Runner. But father time
HomeHealthLimitations of Generative AI in Interpreting Clinical Notes in Healthcare

Limitations of Generative AI in Interpreting Clinical Notes in Healthcare

A recent investigation revealed that ChatGPT-4 is currently unable to effectively read medical notes from Emergency Department admissions to assess whether injured scooter and bicycle riders wore helmets.
In the future, it may become viable to use Large Language Models (LLM) to automatically analyze clinical notes in medical records, allowing for the dependable and efficient extraction of important information to enhance patient care or research. However, a new study conducted by Columbia University’s Mailman School of Public Health disclosed that when using ChatGPT-4 for this purpose, the LLM is not yet able to produce reliable results. These findings were published in JAMA Network Open.

The study examined 54,569 emergency department visits from patients injured while riding bicycles, scooters, or other similar devices from 2019 to 2022. It found that the AI LLM struggled to match the outcomes produced by traditional text string-search methods for identifying helmet use from clinical notes. The LLM only achieved satisfactory performance when the prompt encompassed all text utilized in the string search. Additionally, it faced challenges maintaining consistent results across trials conducted on five consecutive days, proving to be more adept at reproducing inaccuracies rather than correct information. In particular, it had difficulties with phrases indicating a lack of helmet usage, such as “w/o helmet” or “unhelmeted,” inaccurately indicating the patient was wearing a helmet.

Electronic medical records contain extensive medically relevant data, primarily through written clinical notes, which are considered unstructured data. Efficient methods for reading and retrieving information from these notes would greatly benefit research. At present, details from clinical notes can be extracted using simple string-matching searches or advanced AI methods like natural language processing. There was hope that newer LLMs, such as ChatGPT-4, could retrieve this information more quickly and reliably.

While we recognize potential gains in productivity through the use of generative AI LLM for extracting information, challenges related to reliability and inaccuracies currently hinder its effectiveness,” stated Andrew Rundle, DrPH, a professor of Epidemiology at Columbia Mailman School and one of the lead authors. “When we created highly detailed prompts that encompassed all text strings concerning helmets, there were instances when ChatGPT-4 successfully extracted accurate data from the clinical notes. However, the extensive time required to design and verify all necessary text for the prompts, coupled with ChatGPT-4’s inability to reproduce consistent results over days, suggests that ChatGPT-4 is currently not equipped for this task.”

The research utilized publicly available data from the U.S. Consumer Product Safety Commission’s National Electronic Injury Surveillance System, covering a sample of 96 U.S. hospitals. Rundle and his team evaluated emergency department records for patients involved in accidents with e-bikes, bicycles, hoverboards, and powered scooters. They juxtaposed the results of ChatGPT-4’s analyses with those derived from more conventional text-string searches. For 400 records, they also compared ChatGPT’s findings against their own interpretations of the clinical notes.

This study extends their research on preventing injuries for micromobility users, including bicyclists, e-bike riders, and scooter users. “Helmet use is a crucial factor in determining the severity of injuries; however, in most emergency department medical records and incident reports, details regarding helmet usage are often concealed within clinical notes written by physicians or EMS personnel. There is a pressing need for research to efficiently and reliably access this information,” explained Kathryn Burford, the paper’s lead author and a post-doctoral fellow in the Department of Epidemiology at Mailman School.

“Our research looked at the capabilities of an LLM for extracting data from clinical notes, which are valuable resources for health professionals and researchers,” Rundle noted. “Yet, at the time of our study with ChatGPT-4, it was unable to deliver reliable data.”