A new NYU Langone Health study shows that generative artificial intelligence compares favorably to provider responses when answering patient questions in electronic health record email baskets – and could help reduce the documentation burden for clinicians.
Led by researchers at the New York University Grossman School of Medicine, the study found that a genAI messaging tool not only drafted accurate responses to patient EHR queries, but its responses showed greater perceived empathy.
“Our results suggest that chatbots could reduce the workload of care providers by enabling efficient and empathetic responses to patients’ concerns,” Dr. William Small, a clinical assistant professor in NYU Grossman Department of Medicine and the study’s lead author, said in an announcement Tuesday.
WHY IT MATTERS
It was common for physicians to receive more than 150 In Basket messages per day during the pandemic, representing a more than 30% annual increase in daily messages, according to Dr. Paul Testa, NYU Langone’s chief medical information officer.
NYU Langone has looked at how to address note bloat and other documentation burdens that are major sources of physician burnout. For this new study, medical researchers asked sixteen primary care physicians to rate 344 randomly assigned pairs of AI and human responses to patient messages on accuracy, relevance, completeness and tone.
The generative AI responses outperformed human providers in terms of understandability and tone by 9.5%, according to the researchers in the Large Language Model–Based Responses to Patients’ In-Basket Messages report, published by JAMA.
For the blinded study, the participating physicians indicated if they would use the AI response as a first draft, or have to start over. They did not know which responses humans or the AI tool had composed.
As a result, the AI responses were more than 125% more likely to be considered empathetic and 62% more likely to use language that conveyed positivity and affiliation, the researchers said.
However, they were 38% longer and 31% were more likely to contain complex language.
“While humans responded to patient queries at a 6th grade level, AI was writing at an 8th grade level, according to a standard measure of readability called the Flesch Kincaid score,” the researchers said.
They also noted that future studies are needed to confirm whether private data specifically improved AI tool performance.
THE LARGER TREND
In 2023 NYU Langone licensed GPT4 to let physicians experiment using real patient data within a secure environment..
Also last year, the health research organization encouraged teams of clinicians, educators and researchers to work together to test large language models in a no-experience-necessary Generative AI Prompt-A-Thon in Healthcare.
With the event, which engaged 70 participants and more than 500 others through a live webinar, the health system aimed to use observations generated to inform internal generative AI capacity building.
Later, researchers used AI to look at transcripts from 820 individuals who received psychotherapy during the initial wave of COVID-19 in the United States and compare them to healthcare workers.
That AI tool could detect distress in overburdened hospital workers – those who discussed sleep deprivation or mood issues during therapy sessions were more likely to receive diagnoses of anxiety and depression.
ON THE RECORD
“We found that EHR-integrated AI chatbots that use patient-specific data can draft messages similar in quality to human providers,” Small said in a statement.
“With this physician approval in place, GenAI message quality will be equal in the near future in quality, communication style, and usability, to responses generated by humans,” said corresponding author Dr. Devin Mann, senior director of informatics innovation in NYU Langone Medical Center Information Technology.
Andrea Fox is senior editor of Healthcare IT News.
Email: afox@himss.org
Healthcare IT News is a HIMSS Media publication.