AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits

While GPT-4 performs well in structured reasoning tasks, a new study shows that its ability to adapt to variations is weak—suggesting AI still lacks true abstract understanding and flexibility in decision-making.

Artificial Intelligence (AI), particularly large language models like GPT-4, has shown impressive performance on reasoning tasks. But does AI truly understand abstract concepts, or is it just mimicking patterns? A new study from the University of Amsterdam and the Santa Fe Institute reveals that while GPT models perform well on some analogy tasks, they fall short when the problems are altered, highlighting key weaknesses in AI’s reasoning capabilities.

GPT-4’s Accuracy Drops Dramatically in Unfamiliar Letter Sequences – While humans maintain stable performance when letter sequences are scrambled or replaced with symbols, GPT-4 struggles significantly, revealing its reliance on familiar training patterns.

Analogical reasoning is the ability to draw a comparison between two different things based on their similarities in certain aspects. It is one of the most common methods by which human beings try to understand the world and make decisions. An example of analogical reasoning: cup is to coffee as soup is to ??? (the answer being: bowl)

Large language models like GPT-4 perform well on various tests, including those requiring analogical reasoning. But can AI models truly engage in general, robust reasoning, or do they over-rely on patterns from their training data? This study by language and AI experts Martha Lewis (Institute for Logic, Language and Computation at the University of Amsterdam) and Melanie Mitchell (Santa Fe Institute) examined whether GPT models are as flexible and robust as humans in making analogies. ‘This is crucial, as AI is increasingly used for decision-making and problem-solving in the real world,’ explains Lewis.

Comparing AI models to human performance

Lewis and Mitchell compared the performance of humans and GPT models on three different types of analogy problems:

Letter sequences – Identify patterns in letter sequences and complete them correctly.
Digit matrices – Analyzing number patterns and determining the missing numbers.
Story analogies – Understanding which of two stories best corresponds to a given example story.

A system that truly understands analogies should maintain high performance even on variations

In addition to testing whether GPT models could solve the original problems, the study examined how well they performed when the problems were subtly modified. ‘A system that truly understands analogies should maintain high performance even on these variations’, state the authors in their article.

GPT models struggle with robustness

AI’s Story Comprehension Is Superficial – When tested on paraphrased versions of analogy-based stories, GPT-4’s performance declined more than humans, suggesting it relies on surface-level similarities rather than deep causal reasoning.

Humans maintained high performance on most modified versions of the problems, but GPT models, while performing well on standard analogy problems, struggled with variations. ‘This suggests that AI models often reason less flexibly than humans, and their reasoning is less about true abstract understanding and more about pattern matching,’ explains Lewis.

In digit matrices, GPT models showed a significant performance drop when the missing number’s position changed. Humans had no difficulty with this. In story analogies, GPT-4 tended to select the first given answer as correct more often, whereas humans were not influenced by answer order. Additionally, GPT-4 struggled more than humans when key elements of a story were reworded, suggesting a reliance on surface-level similarities rather than deeper causal reasoning.

When tested on modified versions, GPT models showed a decline in performance on simpler analogy tasks, while humans remained consistent. However, both humans and AI struggled with more complex analogical reasoning tasks.

Weaker than human cognition

This research challenges the widespread assumption that AI models like GPT-4 can reason in the same way humans do. ‘While AI models demonstrate impressive capabilities, this does not mean they truly understand what they are doing,’ conclude Lewis and Mitchell. ‘Their ability to generalize across variations is still significantly weaker than human cognition. GPT models often rely on superficial patterns rather than deep comprehension.’

This is a critical warning about using AI in important decision-making areas such as education, law, and healthcare. While AI can be a powerful tool, it is not yet a replacement for human thinking and reasoning.

Source:

Universiteit van Amsterdam

Journal reference:

Lewis, Martha, and Melanie Mitchell. “Evaluating the Robustness of Analogical Reasoning in Large Language Models.” Transactions on Machine Learning Research, 2025, openreview.net/forum?id=t5cy5v9wp

Read The Article

News

New book from Nanoappsmedical Inc. – Global Health Care Equivalency

A new book by Frank Boehm, NanoappsMedical Inc. Founder. This groundbreaking volume explores the vision of a Global Health Care Equivalency (GHCE) system powered by artificial intelligence and quantum computing technologies, operating on secure [...]

New Molecule Blocks Deadliest Brain Cancer at Its Genetic Root

Researchers have identified a molecule that disrupts a critical gene in glioblastoma. Scientists at the UVA Comprehensive Cancer Center say they have found a small molecule that can shut down a gene tied to glioblastoma, a [...]

Scientists Finally Solve a 30-Year-Old Cancer Mystery Hidden in Rye Pollen

Nearly 30 years after rye pollen molecules were shown to slow tumor growth in animals, scientists have finally determined their exact three-dimensional structures. Nearly 30 years ago, researchers noticed something surprising in rye pollen: [...]

NanoMedical Brain/Cloud Interface – Explorations and Implications. A new book from Frank Boehm

New book from Frank Boehm, NanoappsMedical Inc Founder: This book explores the future hypothetical possibility that the cerebral cortex of the human brain might be seamlessly, safely, and securely connected with the Cloud via [...]

How lipid nanoparticles carrying vaccines release their cargo

A study from FAU has shown that lipid nanoparticles restructure their membrane significantly after being absorbed into a cell and ending up in an acidic environment. Vaccines and other medicines are often packed in [...]

New book from NanoappsMedical Inc – Molecular Manufacturing: The Future of Nanomedicine

This book explores the revolutionary potential of atomically precise manufacturing technologies to transform global healthcare, as well as practically every other sector across society. This forward-thinking volume examines how envisaged Factory@Home systems might enable the cost-effective [...]

AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits

Comparing AI models to human performance

GPT models struggle with robustness

Weaker than human cognition

News

New book from Nanoappsmedical Inc. – Global Health Care Equivalency

New Molecule Blocks Deadliest Brain Cancer at Its Genetic Root

Scientists Finally Solve a 30-Year-Old Cancer Mystery Hidden in Rye Pollen

NanoMedical Brain/Cloud Interface – Explorations and Implications. A new book from Frank Boehm

How lipid nanoparticles carrying vaccines release their cargo

New book from NanoappsMedical Inc – Molecular Manufacturing: The Future of Nanomedicine

A Virus Designed in the Lab Could Help Defeat Antibiotic Resistance

Sleep Deprivation Triggers a Strange Brain Cleanup

Lab-grown corticospinal neurons offer new models for ALS and spinal injuries

Urgent warning over deadly ‘brain swelling’ virus amid fears it could spread globally

This Vaccine Stops Bird Flu Before It Reaches the Lungs

These two viruses may become the next public health threats, scientists say

COVID-19 viral fragments shown to target and kill specific immune cells

Smaller Than a Grain of Salt: Engineers Create the World’s Tiniest Wireless Brain Implant

Scientists Develop a New Way To See Inside the Human Body Using 3D Color Imaging

Brain waves could help paralyzed patients move again

AI Struggles with Abstract Thought: Study Reveals GPT-4’s Limits

Comparing AI models to human performance

GPT models struggle with robustness

Weaker than human cognition

News

Share This Story, Choose Your Platform!

Related Posts