Researchers from Mass General Brigham determined that ChatGPT achieved an accuracy rate of almost 72% across all medical specialties and phases of clinical care, and 77 percent accuracy in making final diagnoses.
Researchers from Mass General Brigham have conducted a study which reveals that ChatGPT demonstrated an accuracy rate of approximately 72% in overall clinical decision-making processes, ranging from suggesting potential diagnoses to finalizing diagnoses and determining care management strategies. This expansive language model-based AI chatbot exhibited consistent performance in both primary care and emergency medical environments across diverse medical fields. The findings were recently published in the Journal of Medical Internet Research.
“Our paper comprehensively assesses decision support via ChatGPT from the very beginning of working with a patient through the entire care scenario, from differential diagnosis all the way through testing, diagnosis, and management,” said corresponding author Marc Succi, MD, associate chair of innovation and commercialization and strategic innovation leader at Mass General Brigham and executive director of the MESH Incubator.
“No real benchmarks exist, but we estimate this performance to be at the level of someone who has just graduated from medical school, such as an intern or resident. This tells us that LLMs, in general, have the potential to be an augmenting tool for the practice of medicine and support clinical decision-making with impressive accuracy.”
The study was done by pasting successive portions of 36 standardized, published clinical vignettes into ChatGPT. The tool first was asked to come up with a set of possible, or differential, diagnoses based on the patient’s initial information, which included age, gender, symptoms, and whether the case was an emergency. ChatGPT was then given additional pieces of information and asked to make management decisions as well as give a final diagnosis—simulating the entire process of seeing a real patient. The team compared ChatGPT’s accuracy on differential diagnosis, diagnostic testing, final diagnosis, and management in a structured blinded process, awarding points for correct answers and using linear regressions to assess the relationship between ChatGPT’s performance and the vignette’s demographic information.
The researchers found that overall, ChatGPT was about 72 percent accurate and that it was best in making a final diagnosis, where it was 77 percent accurate. It was lowest-performing in making differential diagnoses, where it was only 60 percent accurate. And it was only 68 percent accurate in clinical management decisions, such as figuring out what medications to treat the patient with after arriving at the correct diagnosis. Other notable findings from the study included that ChatGPT’s answers did not show gender bias and that its overall performance was steady across both primary and emergency care.
“ChatGPT struggled with differential diagnosis, which is the meat and potatoes of medicine when a physician has to figure out what to do,” said Succi. “That is important because it tells us where physicians are truly experts and adding the most value—in the early stages of patient care with little presenting information, when a list of possible diagnoses is needed.”
The authors note that before tools like ChatGPT can be considered for integration into clinical care, more benchmark research and regulatory guidance is needed. Next, Succi’s team is looking at whether AI tools can improve patient care and outcomes in hospitals’ resource-constrained areas.
The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.
“Mass General Brigham sees great promise for LLMs to help improve care delivery and clinician experience,” said co-author Adam Landman, MD, MS, MIS, MHS, chief information officer and senior vice president of digital at Mass General Brigham. “We are currently evaluating LLM solutions that assist with clinical documentation and draft responses to patient messages with a focus on understanding their accuracy, reliability, safety, and equity. Rigorous studies like this one are needed before we integrate LLM tools into clinical care.”
Reference: “Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study” by Arya Rao, Michael Pang, John Kim, Meghana Kamineni, Winston Lie, Anoop K Prasad, Adam Landman, Keith Dreyer and Marc D Succi, 22 August 2023, Journal of Medical Internet Research.
DOI: 10.2196/48659
The study was funded by the National Institute of General Medical Sciences.

News
Unlocking hidden soil microbes for new antibiotics
Most bacteria cannot be cultured in the lab-and that's been bad news for medicine. Many of our frontline antibiotics originated from microbes, yet as antibiotic resistance spreads and drug pipelines run dry, the soil [...]
By working together, cells can extend their senses beyond their direct environment
The story of the princess and the pea evokes an image of a highly sensitive young royal woman so refined, she can sense a pea under a stack of mattresses. When it comes to [...]
Overworked Brain Cells May Hold the Key to Parkinson’s
Scientists at Gladstone Institutes uncovered a surprising reason why dopamine-producing neurons, crucial for smooth body movements, die in Parkinson’s disease. In mice, when these neurons were kept overactive for weeks, they began to falter, [...]
Old tires find new life: Rubber particles strengthen superhydrophobic coatings against corrosion
Development of highly robust superhydrophobic anti-corrosion coating using recycled tire rubber particles. Superhydrophobic materials offer a strategy for developing marine anti-corrosion materials due to their low solid-liquid contact area and low surface energy. However, [...]
This implant could soon allow you to read minds
Mind reading: Long a science fiction fantasy, today an increasingly concrete scientific goal. Researchers at Stanford University have succeeded in decoding internal language in real time thanks to a brain implant and artificial intelligence. [...]
A New Weapon Against Cancer: Cold Plasma Destroys Hidden Tumor Cells
Cold plasma penetrates deep into tumors and attacks cancer cells. Short-lived molecules were identified as key drivers. Scientists at the Leibniz Institute for Plasma Science and Technology (INP), working with colleagues from Greifswald University Hospital and [...]
This Common Sleep Aid May Also Protect Your Brain From Alzheimer’s
Lemborexant and similar sleep medications show potential for treating tau-related disorders, including Alzheimer’s disease. New research from Washington University School of Medicine in St. Louis shows that a commonly used sleep medication can restore normal sleep patterns and [...]
Sugar-Coated Nanoparticles Boost Cancer Drug Efficacy
A team of researchers at the University of Mississippi has discovered that coating cancer treatment carrying nanoparticles in a sugar-like material increases their treatment efficacy. They reported their findings in Advanced Healthcare Materials. Over a tenth of breast [...]
Nanoparticle-Based Vaccine Shows Promise in Fighting Cancer
In a study published in OncoImmunology, researchers from the German Cancer Research Center and Heidelberg University have created a therapeutic vaccine that mobilizes the immune system to target cancer cells. The researchers demonstrated that virus peptides combined [...]
Quantitative imaging method reveals how cells rapidly sort and transport lipids
Lipids are difficult to detect with light microscopy. Using a new chemical labeling strategy, a Dresden-based team led by André Nadler at the Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) and [...]
Ancient DNA reveals cause of world’s first recorded pandemic
Scientists have confirmed that the Justinian Plague, the world’s first recorded pandemic, was caused by Yersinia pestis, the same bacterium behind the Black Death. Dating back some 1,500 years and long described in historical texts but [...]
“AI Is Not Intelligent at All” – Expert Warns of Worldwide Threat to Human Dignity
Opaque AI systems risk undermining human rights and dignity. Global cooperation is needed to ensure protection. The rise of artificial intelligence (AI) has changed how people interact, but it also poses a global risk to human [...]
Nanomotors: Where Are They Now?
First introduced in 2004, nanomotors have steadily advanced from a scientific curiosity to a practical technology with wide-ranging applications. This article explores the key developments, recent innovations, and major uses of nanomotors today. A [...]
Study Finds 95% of Tested Beers Contain Toxic “Forever Chemicals”
Researchers found PFAS in 95% of tested beers, with the highest levels linked to contaminated local water sources. Per- and polyfluoroalkyl substances (PFAS), better known as forever chemicals, are gaining notoriety for their ability [...]
Long COVID Symptoms Are Closer To A Stroke Or Parkinson’s Disease Than Fatigue
When most people get sick with COVID-19 today, they think of it as a brief illness, similar to a cold. However, for a large number of people, the illness doesn't end there. The World [...]
The world’s first AI Hospital, developed in China is transforming healthcare
Artificial Intelligence and its developments have had a revolutionary impact on society, and healthcare is not an exception. China has made massive strides in AI integrated healthcare, and continues to do so as AI [...]