A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”
On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.
Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.
While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?
Generalization and intelligence
To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.
An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.
The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.
Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.
The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.
Grids and patterns
The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.
Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.
These are a lot like the IQ tests sometimes you might remember from school.
Weak rules and adaptation
We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.
To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.
What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.
In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”
Searching chains of thought?
While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.
We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.
French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”
This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.
You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.
There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”
However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.
What we still don’t know
The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.
The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.
Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.
Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.
When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.
If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.
If not, then this will still be an impressive result. However, everyday life will remain much the same.

News
Scientists Flip a Gut Virus “Kill Switch” – Expose a Hidden Threat in Antibiotic Treatment
Scientists have long known that bacteriophages, viruses that infect bacteria, live in our gut, but exactly what they do has remained elusive. Researchers developed a clever mouse model that can temporarily eliminate these phages [...]
Enhanced Antibacterial Polylactic Acid-Curcumin Nanofibers for Wound Dressing
Background Wound healing is a complex physiological process that can be compromised by infection and impaired tissue regeneration. Conventional dressings, typically made from natural fibers such as cotton or linen, offer limited functionality. Nanofiber [...]
Global Nanomaterial Regulation: A Country-by-Country Comparison
Nanomaterials are materials with at least one dimension smaller than 100 nanometres (about 100,000 times thinner than a human hair). Because of their tiny size, they have unique properties that can be useful in [...]
Pandemic Potential: Scientists Discover 3 Hotspots of Deadly Emerging Disease in the US
Virginia Tech researchers discovered six new rodent carriers of hantavirus and identified U.S. hotspots, highlighting the virus’s adaptability and the impact of climate and ecology on its spread. Hantavirus recently drew public attention following reports [...]
Studies detail high rates of long COVID among healthcare, dental workers
Researchers have estimated approximately 8% of Americas have ever experienced long COVID, or lasting symptoms, following an acute COVID-19 infection. Now two recent international studies suggest that the percentage is much higher among healthcare workers [...]
Melting Arctic Ice May Unleash Ancient Deadly Diseases, Scientists Warn
Melting Arctic ice increases human and animal interactions, raising the risk of infectious disease spread. Researchers urge early intervention and surveillance. Climate change is opening new pathways for the spread of infectious diseases such [...]
Scientists May Have Found a Secret Weapon To Stop Pancreatic Cancer Before It Starts
Researchers at Cold Spring Harbor Laboratory have found that blocking the FGFR2 and EGFR genes can stop early-stage pancreatic cancer from progressing, offering a promising path toward prevention. Pancreatic cancer is expected to become [...]
Breakthrough Drug Restores Vision: Researchers Successfully Reverse Retinal Damage
Blocking the PROX1 protein allowed KAIST researchers to regenerate damaged retinas and restore vision in mice. Vision is one of the most important human senses, yet more than 300 million people around the world are at [...]
Differentiating cancerous and healthy cells through motion analysis
Researchers from Tokyo Metropolitan University have found that the motion of unlabeled cells can be used to tell whether they are cancerous or healthy. They observed malignant fibrosarcoma [...]
This Tiny Cellular Gate Could Be the Key to Curing Cancer – And Regrowing Hair
After more than five decades of mystery, scientists have finally unveiled the detailed structure and function of a long-theorized molecular machine in our mitochondria — the mitochondrial pyruvate carrier. This microscopic gatekeeper controls how [...]
Unlocking Vision’s Secrets: Researchers Reveal 3D Structure of Key Eye Protein
Researchers have uncovered the 3D structure of RBP3, a key protein in vision, revealing how it transports retinoids and fatty acids and how its dysfunction may lead to retinal diseases. Proteins play a critical [...]
5 Key Facts About Nanoplastics and How They Affect the Human Body
Nanoplastics are typically defined as plastic particles smaller than 1000 nanometers. These particles are increasingly being detected in human tissues: they can bypass biological barriers, accumulate in organs, and may influence health in ways [...]
Measles Is Back: Doctors Warn of Dangerous Surge Across the U.S.
Parents are encouraged to contact their pediatrician if their child has been exposed to measles or is showing symptoms. Pediatric infectious disease experts are emphasizing the critical importance of measles vaccination, as the highly [...]
AI at the Speed of Light: How Silicon Photonics Are Reinventing Hardware
A cutting-edge AI acceleration platform powered by light rather than electricity could revolutionize how AI is trained and deployed. Using photonic integrated circuits made from advanced III-V semiconductors, researchers have developed a system that vastly [...]
A Grain of Brain, 523 Million Synapses, Most Complicated Neuroscience Experiment Ever Attempted
A team of over 150 scientists has achieved what once seemed impossible: a complete wiring and activity map of a tiny section of a mammalian brain. This feat, part of the MICrONS Project, rivals [...]
The Secret “Radar” Bacteria Use To Outsmart Their Enemies
A chemical radar allows bacteria to sense and eliminate predators. Investigating how microorganisms communicate deepens our understanding of the complex ecological interactions that shape our environment is an area of key focus for the [...]