Inaccuracies in Medical GenAI
New study exposes the inaccuracies of GenAI in medical usages. My conclusion: The impacts of overblown GenAI hype are literally deadly.
➤ Here’s what the researchers did:
- Collected 50 medical notes
- Generated summaries of them using two GenAI chatbots: GPT-4o and Llama-3.
- Compared the summaries and the originals
➤ They checked whether the GenAI summary is accurate: whether the information is true and as specific as it was in the original
➤ The results are frightening:
~40% inaccuracies in medical information
~100% Giving general instead of specific
➤ Here are the definitions of the 7 categories the researchers used:
(1) Patient Information: Hallucinated demographic details and non-medical information about the patient’s background.
(2) Patient History: Hallucinated information regarding the history of present illness.
(3) Symptoms/Diagnosis/Surgical Procedures: Inconsistent symptoms, diagnosis, or procedures found in the patient’s current visit details.
(4) Medicine Related Instructions: Any disparities or discrepancies noted between the medication instructions documented in the summary and those found in the medical note.
(5) Follow-up: Missing information regarding “follow-up” care or instructions provided to the patient.
(6) Chronological Inconsistency. The order of medical events is not consistent with the sequence documented in the EHR.
(7) Incorrect Reasoning. Summary states correct information but the associated reasoning given for it does not make sense or is incorrect.
➤ Bottom line:
Using generative AI summaries in healthcare can be deadly. Literally.
Continue the conversation here or follow us on LinkedIn here.
Comments