While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making.
Artificial Intelligence (AI), particularly large language models like GPT-4, has shown impressive performance on reasoning tasks. But does AI truly understand abstract concepts, or is it just mimicking patterns? A new study from the University of Amsterdam and the Santa Fe Institute reveals that while GPT models perform well on some analogy tasks, they fall short when the problems are altered, highlighting key weaknesses in AI’s reasoning capabilities.
Analogical reasoning is the ability to draw a comparison between two different things based on their similarities in certain aspects. It is one of the most common methods by which human beings try to understand the world and make decisions. An example of analogical reasoning: cup is to coffee as soup is to ??? (the answer being: bowl)
Large language models like GPT-4 perform well on various tests, including those requiring analogical reasoning. But can AI models truly engage in general, robust reasoning, or do they over-rely on patterns from their training data? This study by language and AI experts Martha Lewis (Institute for Logic, Language and Computation at the University of Amsterdam) and Melanie Mitchell (Santa Fe Institute) examined whether GPT models are as flexible and robust as humans in making analogies. ‘This is crucial, as AI is increasingly used for decision-making and problem-solving in the real world,’ explains Lewis.
Comparing AI models to human performance
Lewis and Mitchell compared the performance of humans and GPT models on three different types of analogy problems:
- Letter sequences – Identify patterns in letter sequences and complete them correctly.
- Digit matrices – Analyzing number patterns and determining the missing numbers.
- Story analogies – Understanding which of two stories best corresponds to a given example story.
A system that truly understands analogies should maintain high performance even on variations
In addition to testing whether GPT models could solve the original problems, the study examined how well they performed when the problems were subtly modified. ‘A system that truly understands analogies should maintain high performance even on these variations’, state the authors in their article.
GPT models struggle with robustness
Humans maintained high performance on most modified versions of the problems, but GPT models, while performing well on standard analogy problems, struggled with variations. ‘This suggests that AI models often reason less flexibly than humans, and their reasoning is less about true abstract understanding and more about pattern matching,’ explains Lewis.
In digit matrices, GPT models showed a significant performance drop when the missing number’s position changed. Humans had no difficulty with this. In story analogies, GPT-4 tended to select the first given answer as correct more often, whereas humans were not influenced by answer order. Additionally, GPT-4 struggled more than humans when key elements of a story were reworded, suggesting a reliance on surface-level similarities rather than deeper causal reasoning.
When tested on modified versions, GPT models showed a decline in performance on simpler analogy tasks, while humans remained consistent. However, both humans and AI struggled with more complex analogical reasoning tasks.
Weaker than human cognition
This research challenges the widespread assumption that AI models like GPT-4 can reason in the same way humans do. ‘While AI models demonstrate impressive capabilities, this does not mean they truly understand what they are doing,’ conclude Lewis and Mitchell. ‘Their ability to generalize across variations is still significantly weaker than human cognition. GPT models often rely on superficial patterns rather than deep comprehension.’
This is a critical warning about using AI in important decision-making areas such as education, law, and healthcare. While AI can be a powerful tool, it is not yet a replacement for human thinking and reasoning.
- Lewis, Martha, and Melanie Mitchell. “Evaluating the Robustness of Analogical Reasoning in Large Language Models.” Transactions on Machine Learning Research, 2025, openreview.net/forum?id=t5cy5v9wp

News
AI matches doctors in mapping lung tumors for radiation therapy
In radiation therapy, precision can save lives. Oncologists must carefully map the size and location of a tumor before delivering high-dose radiation to destroy cancer cells while sparing healthy tissue. But this process, called [...]
Scientists Finally “See” Key Protein That Controls Inflammation
Researchers used advanced microscopy to uncover important protein structures. For the first time, two important protein structures in the human body are being visualized, thanks in part to cutting-edge technology at the University of [...]
AI tool detects 9 types of dementia from a single brain scan
Mayo Clinic researchers have developed a new artificial intelligence (AI) tool that helps clinicians identify brain activity patterns linked to nine types of dementia, including Alzheimer's disease, using a single, widely available scan—a transformative [...]
Is plastic packaging putting more than just food on your plate?
New research reveals that common food packaging and utensils can shed microscopic plastics into our food, prompting urgent calls for stricter testing and updated regulations to protect public health. Beyond microplastics: The analysis intentionally [...]
Aging Spreads Through the Bloodstream
Summary: New research reveals that aging isn’t just a local cellular process—it can spread throughout the body via the bloodstream. A redox-sensitive protein called ReHMGB1, secreted by senescent cells, was found to trigger aging features [...]
AI and nanomedicine find rare biomarkers for prostrate cancer and atherosclerosis
Imagine a stadium packed with 75,000 fans, all wearing green and white jerseys—except one person in a solid green shirt. Finding that person would be tough. That's how hard it is for scientists to [...]
Are Pesticides Breeding the Next Pandemic? Experts Warn of Fungal Superbugs
Fungicides used in agriculture have been linked to an increase in resistance to antifungal drugs in both humans and animals. Fungal infections are on the rise, and two UC Davis infectious disease experts, Dr. George Thompson [...]
Scientists Crack the 500-Million-Year-Old Code That Controls Your Immune System
A collaborative team from Penn Medicine and Penn Engineering has uncovered the mathematical principles behind a 500-million-year-old protein network that determines whether foreign materials are recognized as friend or foe. How does your body [...]
Team discovers how tiny parts of cells stay organized, new insights for blocking cancer growth
A team of international researchers led by scientists at City of Hope provides the most thorough account yet of an elusive target for cancer treatment. Published in Science Advances, the study suggests a complex signaling [...]
Nanomaterials in Ophthalmology: A Review
Eye diseases are becoming more common. In 2020, over 250 million people had mild vision problems, and 295 million experienced moderate to severe ocular conditions. In response, researchers are turning to nanotechnology and nanomaterials—tools that are transforming [...]
Natural Plant Extract Removes up to 90% of Microplastics From Water
Researchers found that natural polymers derived from okra and fenugreek are highly effective at removing microplastics from water. The same sticky substances that make okra slimy and give fenugreek its gel-like texture could help [...]
Instant coffee may damage your eyes, genetic study finds
A new genetic study shows that just one extra cup of instant coffee a day could significantly increase your risk of developing dry AMD, shedding fresh light on how our daily beverage choices may [...]
Nanoneedle patch offers painless alternative to traditional cancer biopsies
A patch containing tens of millions of microscopic nanoneedles could soon replace traditional biopsies, scientists have found. The patch offers a painless and less invasive alternative for millions of patients worldwide who undergo biopsies [...]
Small antibodies provide broad protection against SARS coronaviruses
Scientists have discovered a unique class of small antibodies that are strongly protective against a wide range of SARS coronaviruses, including SARS-CoV-1 and numerous early and recent SARS-CoV-2 variants. The unique antibodies target an [...]
Controlling This One Molecule Could Halt Alzheimer’s in Its Tracks
New research identifies the immune molecule STING as a driver of brain damage in Alzheimer’s. A new approach to Alzheimer’s disease has led to an exciting discovery that could help stop the devastating cognitive decline [...]
Cyborg tadpoles are helping us learn how brain development starts
How does our brain, which is capable of generating complex thoughts, actions and even self-reflection, grow out of essentially nothing? An experiment in tadpoles, in which an electronic implant was incorporated into a precursor [...]