Open-Source AI Matches Leading Proprietary Model in Diagnosing Cases
According to a new study, an open-source artificial intelligence (AI) model can diagnose challenging medical problems just as well as a top proprietary model. This breakthrough indicates growing competition in AI-powered diagnostics, which may benefit both patients and healthcare providers.
The study, financed by the National Institutes of Health (NIH) and directed by Harvard Medical School researchers, compared the open-source AI model Llama 3.1 405B to GPT-4, a leading closed-source AI. The findings, published in JAMA Health Forum on March 14, show that open-source AI technologies are catching up with proprietary counterparts in terms of clinical reasoning ability.
The researchers examined 92 difficult clinical cases published in The New England Journal of Medicine. The findings revealed that Llama 3.1 405B successfully diagnosed patients 70% of the time, exceeding GPT-4's accuracy rate of 64%. Furthermore, Llama ranked the correct diagnosis as its initial proposal 41% of the time, whereas GPT-4 did 37%. The open-source model did much better on more recent cases, with a 73% accuracy rate.
"For the first time, an open-source AI model has matched GPT-4 on complex diagnostic cases," stated Arjun Manrai, assistant professor of biomedical informatics at Harvard Medical School. "This rapid progress is remarkable, and competition in this field will benefit patients, care providers, and hospitals."
Medical circles are increasingly discussing the debate between open-source and closed-source AI. Closed-source models, like those produced by OpenAI and Google, are externally hosted, necessitating the transmission of patient data to third-party servers. Open-source models, on the other hand, can be used within a hospital's IT system while maintaining data privacy.
"The open-source model is likely to be more appealing to many chief information officers, hospital administrators, and physicians," said Thomas Buckley, a doctorate candidate at Harvard Medical School. "Keeping data in-house is a major advantage for healthcare providers."
Furthermore, open-source AI models offer greater customisation. Medical experts can fine-tune them using local data to answer unique research and clinical needs, but proprietary models make this more challenging. However, closed-source models offer customer support and are often easier to incorporate into hospital systems.
Both kinds of AI are trained on large medical datasets, such as research publications, clinical decision support systems, and anonymised patient information. By analysing trends in these databases, AI models help medics diagnose illnesses like cancer, heart disease, and infection.
The study also attempted to avoid bias in testing by considering 22 cases published following Llama's training time. The model's excellent performance implies that it was not only recognising previously learnt examples but also thinking through difficult clinical problems.
"As a physician, I've seen the focus on large language models favoured by proprietary systems," said Adam Rodman, a medical professor at Beth Israel Deaconess Medical Centre. "Our study shows that open-source models can be just as effective, giving healthcare professionals greater control over AI tools."
Each year, diagnostic errors cause injury to about 795,000 people in the United States. These errors not only imperil lives, but they also impose huge financial costs on the healthcare system. Misdiagnoses can result in unnecessary tests, ineffective treatments, and protracted diseases.
"Used wisely, AI can be a trusted diagnostic assistant, improving both accuracy and speed," Manrai told the audience. "However, it is crucial that physicians lead this integration to ensure AI serves the best interests of healthcare."