A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure "general intelligence."
On December 20, OpenAI's o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it's a test of an AI system's "sample efficiency" in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was "trained" on millions of examples of human text, constructing probabilistic "rules" about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that "generalize" from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don't know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn't make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the "weakest" rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: "Any shape with a protruding line will move to the end of that line and 'cover up' any other shapes it overlaps with."
Searching chains of thought?
While we don't know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time "thinking" about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different "chains of thought" describing steps to solve the task. It would then choose the "best" according to some loosely defined rule, or "heuristic."
This would be "not dissimilar" to how Google's AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be "choose the weakest" or "choose the simplest."
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don't know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable "chain of thought" found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we'll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.

News
The world’s first AI Hospital, developed in China is transforming healthcare
Artificial Intelligence and its developments have had a revolutionary impact on society, and healthcare is not an exception. China has made massive strides in AI integrated healthcare, and continues to do so as AI [...]
Scientists Rewire Immune Cells To Supercharge Cancer-Fighting Power
Blocking a single protein boosts T cell metabolism and tumor-fighting strength. The discovery could lead to next-generation cancer immunotherapies. Scientists have identified a strategy to greatly enhance the cancer-fighting abilities of the immune system’s [...]
Scientists Discover 20 Percent of Human DNA Comes from a Mysterious Ancestor
Humans carry a complex genetic history that continues to reveal surprises. Scientists have found that 20% of our DNA may come from a mysterious ancestor, according to WP Tech. This discovery changes how we understand [...]
AI detects early prostate cancer missed by pathologists
Men assessed as healthy after a pathologist analyses their tissue sample may still have an early form of prostate cancer. Using AI, researchers at Uppsala University have been able to find subtle tissue changes [...]
The Rare Mutation That Makes People Immune to Viruses
Some people carry a rare mutation that makes them resistant to viruses. Now scientists have copied that effect with an experimental mRNA therapy that stopped both flu and COVID in animal trials — raising [...]
Nanopore technique for measuring DNA damage could improve cancer therapy and radiological emergency response
Scientists at the National Institute of Standards and Technology (NIST) have developed a new technology for measuring how radiation damages DNA molecules. This novel technique, which passes DNA through tiny openings called nanopores, detects [...]
AI Tool Shows Exactly When Genes Turn On and Off
Summary: Researchers have developed an AI-powered tool called chronODE that models how genes turn on and off during brain development. By combining mathematics, machine learning, and genomic data, the method identifies exact “switching points” that [...]
Your brain could get bigger – not smaller – as you age
recently asked myself if I’ll still have a healthy brain as I get older. I hold a professorship at a neurology department. Nevertheless, it is difficult for me to judge if a particular brain, [...]
Hidden Cost of Smart AI: 50× More CO₂ for a Single Question
Every time we ask an AI a question, it doesn’t just return an answer—it also burns energy and emits carbon dioxide. German researchers found that some “thinking” AI models, which generate long, step-by-step reasoning [...]
Genetically-engineered immune cells show promise for preventing organ rejection
A Medical University of South Carolina team reports in Frontiers in Immunology that it has engineered a new type of genetically modified immune cell that can precisely target and neutralize antibody-producing cells complicit in organ rejection. [...]
Building and breaking plastics with light: Chemists rethink plastic recycling
What if recycling plastics were as simple as flicking a switch? At TU/e, Assistant Professor Fabian Eisenreich is making that vision a reality by using LED light to both create and break down a [...]
Generative AI Designs Novel Antibiotics That Defeat Defiant Drug-Resistant Superbugs
Harnessing generative AI, MIT scientists have created groundbreaking antibiotics with unique membrane-targeting mechanisms, offering fresh hope against two of the world’s most formidable drug-resistant pathogens. With the help of artificial intelligence, MIT researchers have [...]
AI finds more breast tumors earlier than traditional double radiologist review
AI is detecting tumors more often and earlier in the Dutch breast cancer screening program. Those tumors can then be treated at an earlier stage. This has been demonstrated by researchers led by Radboud [...]
Lavender oil could speed recovery after brain surgery
A week of lavender-scented nights helped brain surgery patients sleep more deeply, shorten delirium, and feel calmer, pointing to a simple, natural aid for post-surgery care. A randomized controlled trial investigating the therapeutic impact [...]
Targeting Nanoparticles for Heart Repair
Scientists have engineered dual-membrane nanoparticles that home in on heart tissue after a heart attack, delivering regenerative molecules while evading the body’s immune defences. Myocardial infarction, better known as a heart attack, is a [...]
Natural Compound Combo Restores Aging Brain Cells
Scientists have identified a natural compound combination that reverses aging-related brain cell decline and removes harmful Alzheimer’s-linked proteins. The treatment, combining nicotinamide (vitamin B3) and the green tea antioxidant epigallocatechin gallate, restores guanosine triphosphate [...]