A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.

Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.
News
Yale Scientists Solve a Century-Old Brain Wave Mystery
Yale scientists traced gamma brain waves to thalamus-cortex interactions. The discovery could reveal how brain rhythms shape perception and disease. For more than a century, scientists have observed rhythmic waves of synchronized neuronal activity [...]
Can introducing peanuts early prevent allergies? Real-world data confirms it helps
New evidence from a large U.S. primary care network shows that early peanut introduction, endorsed in 2015 and 2017 guidelines, was followed by a marked decline in clinician-diagnosed peanut and overall food allergies among [...]
Nanoparticle blueprints reveal path to smarter medicines
Lipid nanoparticles (LNPs) are the delivery vehicles of modern medicine, carrying cancer drugs, gene therapies and vaccines into cells. Until recently, many scientists assumed that all LNPs followed more or less the same blueprint, [...]
How nanomedicine and AI are teaming up to tackle neurodegenerative diseases
When I first realized the scale of the challenge posed by neurodegenerative diseases, such as Alzheimer's, Parkinson's disease and amyotrophic lateral sclerosis (ALS), I felt simultaneously humbled and motivated. These disorders are not caused [...]
Self-Organizing Light Could Transform Computing and Communications
USC engineers have demonstrated a new kind of optical device that lets light organize its own route using the principles of thermodynamics. Instead of relying on switches or digital control, the light finds its own [...]
Groundbreaking New Way of Measuring Blood Pressure Could Save Thousands of Lives
A new method that improves the accuracy of interpreting blood pressure measurements taken at the ankle could be vital for individuals who are unable to have their blood pressure measured on the arm. A newly developed [...]
Scientist tackles key roadblock for AI in drug discovery
The drug development pipeline is a costly and lengthy process. Identifying high-quality "hit" compounds—those with high potency, selectivity, and favorable metabolic properties—at the earliest stages is important for reducing cost and accelerating the path [...]
Nanoplastics with environmental coatings can sneak past the skin’s defenses
Plastic is ubiquitous in the modern world, and it's notorious for taking a long time to completely break down in the environment - if it ever does. But even without breaking down completely, plastic [...]
Chernobyl scientists discover black fungus feeding on deadly radiation
It looks pretty sinister, but it might actually be incredibly helpful When reactor number four in Chernobyl exploded, it triggered the worst nuclear disaster in history, one which the surrounding area still has not [...]
Long COVID Is Taking A Silent Toll On Mental Health, Here’s What Experts Say
Months after recovering from COVID-19, many people continue to feel unwell. They speak of exhaustion that doesn’t fade, difficulty breathing, or an unsettling mental haze. What’s becoming increasingly clear is that recovery from the [...]
Study Delivers Cancer Drugs Directly to the Tumor Nucleus
A new peptide-based nanotube treatment sneaks chemo into drug-resistant cancer cells, providing a unique workaround to one of oncology’s toughest hurdles. CiQUS researchers have developed a novel molecular strategy that allows a chemotherapy drug to [...]
Scientists Begin $14.2 Million Project To Decode the Body’s “Hidden Sixth Sense”
An NIH-supported initiative seeks to unravel how the nervous system tracks and regulates the body’s internal organs. How does your brain recognize when it’s time to take a breath, when your blood pressure has [...]
Scientists Discover a New Form of Ice That Shouldn’t Exist
Researchers at the European XFEL and DESY are investigating unusual forms of ice that can exist at room temperature when subjected to extreme pressure. Ice comes in many forms, even when made of nothing but water [...]
Nobel-winning, tiny ‘sponge crystals’ with an astonishing amount of inner space
The 2025 Nobel Prize in chemistry was awarded to Richard Robson, Susumu Kitagawa and Omar Yaghi on Oct. 8, 2025, for the development of metal-organic frameworks, or MOFs, which are tunable crystal structures with extremely [...]
Harnessing Green-Synthesized Nanoparticles for Water Purification
A new review reveals how plant- and microbe-derived nanoparticles can power next-gen water disinfection, delivering cleaner, safer water without the environmental cost of traditional treatments. A recent review published in Nanomaterials highlights the potential of green-synthesized nanomaterials (GSNMs) in [...]
Brainstem damage found to be behind long-lasting effects of severe Covid-19
Damage to the brainstem - the brain's 'control center' - is behind long-lasting physical and psychiatric effects of severe Covid-19 infection, a study suggests. Using ultra-high-resolution scanners that can see the living brain in [...]















