Doctors and researchers from the University of Maryland School of Medicine, the UMD Institute for Health Computing and the VA Maryland Healthcare System are concerned that large language models summarizing clinical data could meet the U.S. Food and Drug Administration’s device-exemption criteria and could cause patient harm.
WHY IT MATTERS
Artificial intelligence that summarizes clinical notes, medications and other patient data without FDA oversight will soon reach patients, doctors and researchers said in a new viewpoint published Monday on the JAMA Network.
They analyzed FDA’s final guidance on clinical decision support software. The agency has interpreted it as involving “time-critical” decision-making as a regulated device function, and that could include LLM generation of a clinical summary, the authors said.
Published about two months before ChatGPT’s release, the researchers said the guidance “provides an unintentional ‘roadmap’ for how LLMs could avoid FDA regulation.”
Generative AI will change everyday clinical tasks. It has earned a great deal of attention for its promise to reduce physician and nurse burnout, and to improve healthcare operational efficiencies, but LLMs that summarize clinical notes, medications and other forms of patient data “could exert important and unpredictable effects on clinician decision-making,” the researchers said.
They conducted tests using ChatGPT and anonymized patient record data, and examined the summarization outputs, concluding, that results raise questions that go beyond “accuracy.”
“In the clinical context, sycophantic summaries could accentuate or otherwise emphasize facts that comport with clinicians’ preexisting suspicions, risking a confirmation bias that could increase diagnostic error,” they said.
“For example, when prompted to summarize previous admissions for a hypothetical patient, summaries varied in clinically meaningful ways, depending on whether there was concern for myocardial infarction or pneumonia.”
Lead author Katherine Goodman, a legal expert with the UMD School of Medicine Department of Epidemiology and Public Health, studies clinical algorithms and laws and regulations to understand adverse patient effects.
She and her research team said that they found LLM-generated summaries to be highly variable. While they may be developed to avoid full-blown hallucinations, they could include small errors with important clinical influence.
In one example from their study, a chest radiography report noted “indications of chills and nonproductive cough,” but the LLM summary added “fever.”
“Including ‘fever,’ although a [one-word] mistake, completes an illness script that could lead a physician toward a pneumonia diagnosis and initiation of antibiotics when they might not have reached that conclusion otherwise,” they said.
However, it’s a dystopian danger that generally arises “when LLMs tailor responses to perceived user expectations” and become virtual AI yes-men to clinicians.
“Like the behavior of an eager personal assistant.”
THE LARGER TREND
Others have said that the FDA regulatory framework around AI as medical devices could be curtailing innovation.
During a discussion of the practical application of AI in the medical device industry in London in December, Tim Murdoch, business development lead for digital products at the Cambridge Design Partnership, was critical that FDA regulations would cut out genAI innovation.
“The FDA allows AI as a medical device,” he said, according to a story by the Medical Device Network.
“They are still focused on locking the algorithm down. It is not a continuous learning exercise.”
One year ago, the CDS Coalition asked the FDA to rescind its clinical decision support guidance and better balance regulatory oversight with the healthcare sector’s need for innovation.
The coalition suggested that in the final guidance, the FDA compromised its ability to enforce the law, in a situation it said would lead to public health harm.
ON THE RECORD
“Large language models summarizing clinical data promise powerful opportunities to streamline information-gathering from the EHR,” the researchers acknowledged in their report. “But by dealing in language, they also bring unique risks that are not clearly covered by existing FDA regulatory safeguards.”
“As summarization tools speed closer to clinical practice, transparent development of standards for LLM-generated clinical summaries, paired with pragmatic clinical studies, will be critical to the safe and prudent rollout of these technologies.”
Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.