While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making.
Artificial Intelligence (AI), particularly large language models like GPT-4, has shown impressive performance on reasoning tasks. But does AI truly understand abstract concepts, or is it just mimicking patterns? A new study from the University of Amsterdam and the Santa Fe Institute reveals that while GPT models perform well on some analogy tasks, they fall short when the problems are altered, highlighting key weaknesses in AI’s reasoning capabilities.
Analogical reasoning is the ability to draw a comparison between two different things based on their similarities in certain aspects. It is one of the most common methods by which human beings try to understand the world and make decisions. An example of analogical reasoning: cup is to coffee as soup is to ??? (the answer being: bowl)
Large language models like GPT-4 perform well on various tests, including those requiring analogical reasoning. But can AI models truly engage in general, robust reasoning, or do they over-rely on patterns from their training data? This study by language and AI experts Martha Lewis (Institute for Logic, Language and Computation at the University of Amsterdam) and Melanie Mitchell (Santa Fe Institute) examined whether GPT models are as flexible and robust as humans in making analogies. ‘This is crucial, as AI is increasingly used for decision-making and problem-solving in the real world,’ explains Lewis.
Comparing AI models to human performance
Lewis and Mitchell compared the performance of humans and GPT models on three different types of analogy problems:
- Letter sequences – Identify patterns in letter sequences and complete them correctly.
- Digit matrices – Analyzing number patterns and determining the missing numbers.
- Story analogies – Understanding which of two stories best corresponds to a given example story.
A system that truly understands analogies should maintain high performance even on variations
In addition to testing whether GPT models could solve the original problems, the study examined how well they performed when the problems were subtly modified. ‘A system that truly understands analogies should maintain high performance even on these variations’, state the authors in their article.
GPT models struggle with robustness
Humans maintained high performance on most modified versions of the problems, but GPT models, while performing well on standard analogy problems, struggled with variations. ‘This suggests that AI models often reason less flexibly than humans, and their reasoning is less about true abstract understanding and more about pattern matching,’ explains Lewis.
In digit matrices, GPT models showed a significant performance drop when the missing number’s position changed. Humans had no difficulty with this. In story analogies, GPT-4 tended to select the first given answer as correct more often, whereas humans were not influenced by answer order. Additionally, GPT-4 struggled more than humans when key elements of a story were reworded, suggesting a reliance on surface-level similarities rather than deeper causal reasoning.
When tested on modified versions, GPT models showed a decline in performance on simpler analogy tasks, while humans remained consistent. However, both humans and AI struggled with more complex analogical reasoning tasks.
Weaker than human cognition
This research challenges the widespread assumption that AI models like GPT-4 can reason in the same way humans do. ‘While AI models demonstrate impressive capabilities, this does not mean they truly understand what they are doing,’ conclude Lewis and Mitchell. ‘Their ability to generalize across variations is still significantly weaker than human cognition. GPT models often rely on superficial patterns rather than deep comprehension.’
This is a critical warning about using AI in important decision-making areas such as education, law, and healthcare. While AI can be a powerful tool, it is not yet a replacement for human thinking and reasoning.
- Lewis, Martha, and Melanie Mitchell. “Evaluating the Robustness of Analogical Reasoning in Large Language Models.” Transactions on Machine Learning Research, 2025, openreview.net/forum?id=t5cy5v9wp

News
Controlling This One Molecule Could Halt Alzheimer’s in Its Tracks
New research identifies the immune molecule STING as a driver of brain damage in Alzheimer’s. A new approach to Alzheimer’s disease has led to an exciting discovery that could help stop the devastating cognitive decline [...]
Cyborg tadpoles are helping us learn how brain development starts
How does our brain, which is capable of generating complex thoughts, actions and even self-reflection, grow out of essentially nothing? An experiment in tadpoles, in which an electronic implant was incorporated into a precursor [...]
Prime Editing: The Next Frontier in Genetic Medicine
By Dr. Chinta SidharthanReviewed by Benedette Cuffari, M.Sc. Discover how prime editing is redefining the future of medicine by offering highly precise, safe, and versatile DNA corrections, bringing hope for more effective treatments for genetic diseases [...]
Can scientists predict life longevity from a drop of blood?
Discover how a new epigenetic clock measures how fast you are really aging from just a drop of blood or saliva. A recent study published in the journal Nature Aging constructed an intrinsic capacity (IC) clock [...]
What is different about the NB.1.8.1 Covid variant?
For many of us, Covid-19 feels like a chapter we’ve closed – along with the days of PCR tests, mask mandates and daily case updates. But while life may feel back to normal, the [...]
Scientists discover single cell creatures can learn new behaviours
It was previously thought that learning behaviours only applied to animals with complex brain and nervous systems, but a new study has proven that this may also occur in individual cells. As a result, this new evidence may change how [...]
Virus which ’causes multiple organ failure’ found at popular Spanish holiday destination
British tourists planning trips to Spain have been warned after a deadly virus that can cause multiple organ failure has been detected in the country. The Foreign Office issued the alert on its dedicated website Travel [...]
Urgent health warning as dangerous new Covid virus from China triggers US outbreak
A dangerous new Covid variant from China is surging in California, health officials warn. The California Department of Public Health warned this week the highly contagious NB.1.8.1 strain has been detected in the state, making it the [...]
How the evolution of a single gene allowed the plague to adapt, prolonging the pandemics
Scientists have documented the way a single gene in the bacterium that causes bubonic plague, Yersinia pestis, allowed it to survive hundreds of years by adjusting its virulence and the length of time it [...]
Inhalable Nanovaccines: The Future of Needle-Free Immunization
The COVID-19 pandemic highlighted the need for adaptable and scalable vaccine technologies. While mRNA vaccines have improved disease prevention, most are delivered by intramuscular injection, which may not effectively prevent infections that begin at [...]
‘Stealthy’ lipid nanoparticles give mRNA vaccines a makeover
A new material developed at Cornell University could significantly improve the delivery and effectiveness of mRNA vaccines by replacing a commonly used ingredient that may trigger unwanted immune responses in some people. Thanks to [...]
You could be inhaling nearly 70,000 plastic particles annually, what it means for your health
Invisible plastics in the air are infiltrating our bodies and cities. Scientists reveal the urgent health dangers and outline bold solutions for a cleaner, safer future. In a recent review article published in the [...]
Experts explain how H5 avian influenza adapts to infect more animals
A new global review reveals how rapidly evolving H5 bird flu viruses are reaching new species, including dairy cattle, and stresses the urgent need for coordinated action to prevent the next pandemic. Since its [...]
3D-printed device enables precise modeling of complex human tissues in the lab
A new, easily adopted, 3D-printed device will enable scientists to create models of human tissue with even greater control and complexity. An interdisciplinary group of researchers at the University of Washington and UW Medicine [...]
Ancient DNA sheds light on evolution of relapsing fever bacteria
Researchers at the Francis Crick Institute and UCL have analyzed ancient DNA from Borrelia recurrentis, a type of bacteria that causes relapsing fever, pinpointing when it evolved to spread through lice rather than ticks, and [...]
Cold Sore Virus Linked to Alzheimer’s, Antivirals May Lower Risk
Summary: A large study suggests that symptomatic infection with herpes simplex virus 1 (HSV-1)—best known for causing cold sores—may significantly raise the risk of developing Alzheimer’s disease. Researchers found that people with HSV-1 were 80% [...]