Researchers from Mass General Brigham determined that ChatGPT achieved an accuracy rate of almost 72% across all medical specialties and phases of clinical care, and 77 percent accuracy in making final diagnoses.
Researchers from Mass General Brigham have conducted a study which reveals that ChatGPT demonstrated an accuracy rate of approximately 72% in overall clinical decision-making processes, ranging from suggesting potential diagnoses to finalizing diagnoses and determining care management strategies. This expansive language model-based AI chatbot exhibited consistent performance in both primary care and emergency medical environments across diverse medical fields. The findings were recently published in the Journal of Medical Internet Research.
“Our paper comprehensively assesses decision support via ChatGPT from the very beginning of working with a patient through the entire care scenario, from differential diagnosis all the way through testing, diagnosis, and management,” said corresponding author Marc Succi, MD, associate chair of innovation and commercialization and strategic innovation leader at Mass General Brigham and executive director of the MESH Incubator.
“No real benchmarks exist, but we estimate this performance to be at the level of someone who has just graduated from medical school, such as an intern or resident. This tells us that LLMs, in general, have the potential to be an augmenting tool for the practice of medicine and support clinical decision-making with impressive accuracy.”
The study was done by pasting successive portions of 36 standardized, published clinical vignettes into ChatGPT. The tool first was asked to come up with a set of possible, or differential, diagnoses based on the patient’s initial information, which included age, gender, symptoms, and whether the case was an emergency. ChatGPT was then given additional pieces of information and asked to make management decisions as well as give a final diagnosis—simulating the entire process of seeing a real patient. The team compared ChatGPT’s accuracy on differential diagnosis, diagnostic testing, final diagnosis, and management in a structured blinded process, awarding points for correct answers and using linear regressions to assess the relationship between ChatGPT’s performance and the vignette’s demographic information.
The researchers found that overall, ChatGPT was about 72 percent accurate and that it was best in making a final diagnosis, where it was 77 percent accurate. It was lowest-performing in making differential diagnoses, where it was only 60 percent accurate. And it was only 68 percent accurate in clinical management decisions, such as figuring out what medications to treat the patient with after arriving at the correct diagnosis. Other notable findings from the study included that ChatGPT’s answers did not show gender bias and that its overall performance was steady across both primary and emergency care.
“ChatGPT struggled with differential diagnosis, which is the meat and potatoes of medicine when a physician has to figure out what to do,” said Succi. “That is important because it tells us where physicians are truly experts and adding the most value—in the early stages of patient care with little presenting information, when a list of possible diagnoses is needed.”
The authors note that before tools like ChatGPT can be considered for integration into clinical care, more benchmark research and regulatory guidance is needed. Next, Succi’s team is looking at whether AI tools can improve patient care and outcomes in hospitals’ resource-constrained areas.
The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.
“Mass General Brigham sees great promise for LLMs to help improve care delivery and clinician experience,” said co-author Adam Landman, MD, MS, MIS, MHS, chief information officer and senior vice president of digital at Mass General Brigham. “We are currently evaluating LLM solutions that assist with clinical documentation and draft responses to patient messages with a focus on understanding their accuracy, reliability, safety, and equity. Rigorous studies like this one are needed before we integrate LLM tools into clinical care.”
Reference: “Assessing the Utility of ChatGPT Throughout the Entire Clinical Workflow: Development and Usability Study” by Arya Rao, Michael Pang, John Kim, Meghana Kamineni, Winston Lie, Anoop K Prasad, Adam Landman, Keith Dreyer and Marc D Succi, 22 August 2023, Journal of Medical Internet Research.
DOI: 10.2196/48659
The study was funded by the National Institute of General Medical Sciences.

News
Tiny robots made from human cells heal damaged tissue
The ‘anthrobots’ were able to repair a scratch in a layer of neurons in the lab. Scientists have developed tiny robots made of human cells that are able to repair damaged neural tissue1. The [...]
Antimicrobial Resistance – A Global Concern
Key facts Antimicrobial resistance (AMR) is one of the top global public health and development threats. It is estimated that bacterial AMR was directly responsible for 1.27 million global deaths in 2019 and contributed to [...]
Advancing Pancreatic Cancer Treatment with Nanoparticle-Based Chemotherapy
Pancreatic cancer, a particularly lethal form of cancer and the fourth leading cause of cancer-related deaths in the western world, often remains undiagnosed until its advanced stages due to a lack of early symptoms. [...]
The ‘jigglings and wigglings of atoms’ reveal key aspects of COVID-19 virulence evolution
Richard Feynman famously stated, "Everything that living things do can be understood in terms of the jigglings and wigglings of atoms." This week, Nature Nanotechnology features a study that sheds new light on the evolution of the coronavirus [...]
AI system self-organizes to develop features of brains of complex organisms
Cambridge scientists have shown that placing physical constraints on an artificially-intelligent system—in much the same way that the human brain has to develop and operate within physical and biological constraints—allows it to develop features [...]
How Blind People Recognize Faces via Sound
Summary: A new study reveals that people who are blind can recognize faces using auditory patterns processed by the fusiform face area, a brain region crucial for face processing in sighted individuals. The study employed [...]
Treating tumors with engineered dendritic cells
Cancer biologists at EPFL, UNIGE, and the German Cancer Research Center (Heidelberg) have developed a novel immunotherapy that does not require knowledge of a tumor's antigenic makeup. The new results may pave the way [...]
Networking nano-biosensors for wireless communication in the blood
Biological computing machines, such as micro and nano-implants that can collect important information inside the human body, are transforming medicine. Yet, networking them for communication has proven challenging. Now, a global team, including EPFL [...]
Popular Hospital Disinfectant Ineffective Against Common Superbug
Research conducted during World Antimicrobial Awareness Week examines the effects of employing suggested chlorine-based chemicals to combat Clostridioides difficile, the leading cause of antibiotic-related illness in healthcare environments worldwide. A recent study reveals that a [...]
Subjectivity and the Evolution of AI Philosophy
An Historical Overview of the Philosophy of Artificial Intelligence by Anton Vokrug Many famous people in the philosophy of technology have tried to comprehend the essence of technology and link it to society and human [...]
How Lockdowns Shaped the Virus: AI Uncovers COVID-19’s Evolutionary Secrets
A new research study shows that human behavior, like lockdowns, influences the evolution of COVID-19, leading to strains that are more transmissible earlier in their lifecycle. Using artificial intelligence technology and mathematical modeling, a research [...]
Groundbreaking therapy approved: chances of cure for 7000 diseases:
Hereditary diseases are usually not curable. Now, however, an epochal turning point is taking place in medicine: For the first time ever, a therapy with the CRISPR/Cas9 gene scissors has received approval. According to [...]
Uncovering the Genetic Mystery: Why Some Never Show COVID-19 Symptoms
New study shows that common genetic variation among people is responsible for mediating SARS-CoV-2 asymptomatic infection. Have you ever wondered why some people never became sick from COVID-19? A study published recently in the journal Nature shows that common [...]
AI maps tumor geography for tailored treatments
Researchers have integrated AI approaches from satellite mapping and community ecology to develop a tool to interpret data obtained from tumor tissue imaging, with the aim of implementing a more individualized approach to cancer care. [...]
Lung cancer cells’ ‘memories’ suggest new strategy for improving treatment
A new understanding of lung cancer cells' "memories" suggests a new strategy for improving treatment, Memorial Sloan Kettering Cancer Center (MSK) researchers have found. Research from the lab of cancer biologist Tuomas Tammela, MD, Ph.D. [...]
Artificial sensor similar to a human fingerprint can recognize fine fabric textures
An artificial sensory system that is able to recognize fine textures—such as twill, corduroy and wool—with a high resolution, similar to a human finger, is reported in a Nature Communications paper. The findings may help improve the subtle [...]