An AI system has reached human level on a test for ‘general intelligence’

A new artificial intelligence (AI) model has just achieved human-level results on a test designed to measure “general intelligence.”

On December 20, OpenAI’s o3 system scored 85% on the ARC-AGI benchmark, well above the previous AI best score of 55% and on par with the average human score. It also scored well on a very difficult mathematics test.

Creating artificial general intelligence, or AGI, is the stated goal of all the major AI research labs. At first glance, OpenAI appears to have at least made a significant step towards this goal.

While skepticism remains, many AI researchers and developers feel something just changed. For many, the prospect of AGI now seems more real, urgent and closer than anticipated. Are they right?

Generalization and intelligence

To understand what the o3 result means, you need to understand what the ARC-AGI test is all about. In technical terms, it’s a test of an AI system’s “sample efficiency” in adapting to something new—how many examples of a novel situation the system needs to see to figure out how it works.

An AI system like ChatGPT (GPT-4) is not very sample efficient. It was “trained” on millions of examples of human text, constructing probabilistic “rules” about which combinations of words are most likely.

The result is pretty good at common tasks. It is bad at uncommon tasks, because it has less data (fewer samples) about those tasks.

Until AI systems can learn from small numbers of examples and adapt with more sample efficiency, they will only be used for very repetitive jobs and ones where the occasional failure is tolerable.

The ability to accurately solve previously unknown or novel problems from limited samples of data is known as the capacity to generalize. It is widely considered a necessary, even fundamental, element of intelligence.

Grids and patterns

The ARC-AGI benchmark tests for sample efficient adaptation using little grid square problems like the one below. The AI needs to figure out the pattern that turns the grid on the left into the grid on the right.

An AI system has reached human level on a test for 'general intelligence'—here's what that means — An example task from the ARC-AGI benchmark test. Credit: ARC Prize

Each question gives three examples to learn from. The AI system then needs to figure out the rules that “generalize” from the three examples to the fourth.

These are a lot like the IQ tests sometimes you might remember from school.

Weak rules and adaptation

We don’t know exactly how OpenAI has done it, but the results suggest the o3 model is highly adaptable. From just a few examples, it finds rules that can be generalized.

To figure out a pattern, we shouldn’t make any unnecessary assumptions, or be more specific than we really have to be. In theory, if you can identify the “weakest” rules that do what you want, then you have maximized your ability to adapt to new situations.

What do we mean by the weakest rules? The technical definition is complicated, but weaker rules are usually ones that can be described in simpler statements.

In the example above, a plain English expression of the rule might be something like: “Any shape with a protruding line will move to the end of that line and ‘cover up’ any other shapes it overlaps with.”

Searching chains of thought?

While we don’t know how OpenAI achieved this result just yet, it seems unlikely they deliberately optimized the o3 system to find weak rules. However, to succeed at the ARC-AGI tasks, it must be finding them.

We do know that OpenAI started with a general-purpose version of the o3 model (which differs from most other models, because it can spend more time “thinking” about difficult questions) and then trained it specifically for the ARC-AGI test.

French AI researcher Francois Chollet, who designed the benchmark, believes o3 searches through different “chains of thought” describing steps to solve the task. It would then choose the “best” according to some loosely defined rule, or “heuristic.”

This would be “not dissimilar” to how Google’s AlphaGo system searched through different possible sequences of moves to beat the world Go champion.

You can think of these chains of thought like programs that fit the examples. Of course, if it is like the Go-playing AI, then it needs a heuristic, or loose rule, to decide which program is best.

There could be thousands of different seemingly equally valid programs generated. That heuristic could be “choose the weakest” or “choose the simplest.”

However, if it is like AlphaGo then they simply had an AI create a heuristic. This was the process for AlphaGo. Google trained a model to rate different sequences of moves as better or worse than others.

What we still don’t know

The question then is, is this really closer to AGI? If that is how o3 works, then the underlying model might not be much better than previous models.

The concepts the model learns from language might not be any more suitable for generalization than before. Instead, we may just be seeing a more generalizable “chain of thought” found through the extra steps of training a heuristic specialized to this test. The proof, as always, will be in the pudding.

Almost everything about o3 remains unknown. OpenAI has limited disclosure to a few media presentations and early testing to a handful of researchers, laboratories and AI safety institutions.

Truly understanding the potential of o3 will require extensive work, including evaluations, an understanding of the distribution of its capacities, how often it fails and how often it succeeds.

When o3 is finally released, we’ll have a much better idea of whether it is approximately as adaptable as an average human.

If so, it could have a huge, revolutionary, economic impact, ushering in a new era of self-improving accelerated intelligence. We will require new benchmarks for AGI itself and serious consideration of how it ought to be governed.

If not, then this will still be an impressive result. However, everyday life will remain much the same.

Provided by The Conversation

Read The Article

News

Scientists Unlock a New Way to Hear the Brain’s Hidden Language

Scientists can finally hear the brain’s quietest messages—unlocking the hidden code behind how neurons think, decide, and remember. Scientists have created a new protein that can capture the incoming chemical signals received by brain [...]

Does being infected or vaccinated first influence COVID-19 immunity?

A new study analyzing the immune response to COVID-19 in a Catalan cohort of health workers sheds light on an important question: does it matter whether a person was first infected or first vaccinated? [...]

We May Never Know if AI Is Conscious, Says Cambridge Philosopher

As claims about conscious AI grow louder, a Cambridge philosopher argues that we lack the evidence to know whether machines can truly be conscious, let alone morally significant. A philosopher at the University of [...]

AI Helped Scientists Stop a Virus With One Tiny Change

Using AI, researchers identified one tiny molecular interaction that viruses need to infect cells. Disrupting it stopped the virus before infection could begin. Washington State University scientists have uncovered a method to interfere with a key [...]

Deadly Hospital Fungus May Finally Have a Weakness

A deadly, drug-resistant hospital fungus may finally have a weakness—and scientists think they’ve found it. Researchers have identified a genetic process that could open the door to new treatments for a dangerous fungal infection [...]

Fever-Proof Bird Flu Variant Could Fuel the Next Pandemic

Bird flu viruses present a significant risk to humans because they can continue replicating at temperatures higher than a typical fever. Fever is one of the body’s main tools for slowing or stopping viral [...]

What could the future of nanoscience look like?

Society has a lot to thank for nanoscience. From improved health monitoring to reducing the size of electronics, scientists’ ability to delve deeper and better understand chemistry at the nanoscale has opened up numerous [...]

Scientists Melt Cancer’s Hidden “Power Hubs” and Stop Tumor Growth

Researchers discovered that in a rare kidney cancer, RNA builds droplet-like hubs that act as growth control centers inside tumor cells. By engineering a molecular switch to dissolve these hubs, they were able to halt cancer [...]

Platelet-inspired nanoparticles could improve treatment of inflammatory diseases

Scientists have developed platelet-inspired nanoparticles that deliver anti-inflammatory drugs directly to brain-computer interface implants, doubling their effectiveness. Scientists have found a way to improve the performance of brain-computer interface (BCI) electrodes by delivering anti-inflammatory drugs directly [...]

After 150 years, a new chapter in cancer therapy is finally beginning

For decades, researchers have been looking for ways to destroy cancer cells in a targeted manner without further weakening the body. But for many patients whose immune system is severely impaired by chemotherapy or radiation, [...]

Older chemical libraries show promise for fighting resistant strains of COVID-19 virus

SARS‑CoV‑2, the virus that causes COVID-19, continues to mutate, with some newer strains becoming less responsive to current antiviral treatments like Paxlovid. Now, University of California San Diego scientists and an international team of [...]

Lower doses of immunotherapy for skin cancer give better results, study suggests

According to a new study, lower doses of approved immunotherapy for malignant melanoma can give better results against tumors, while reducing side effects. This is reported by researchers at Karolinska Institutet in the Journal of the National [...]

Researchers highlight five pathways through which microplastics can harm the brain

Microplastics could be fueling neurodegenerative diseases like Alzheimer's and Parkinson's, with a new study highlighting five ways microplastics can trigger inflammation and damage in the brain. More than 57 million people live with dementia, [...]

Tiny Metal Nanodots Obliterate Cancer Cells While Largely Sparing Healthy Tissue

Scientists have developed tiny metal-oxide particles that push cancer cells past their stress limits while sparing healthy tissue. An international team led by RMIT University has developed tiny particles called nanodots, crafted from a metallic compound, [...]

Gold Nanoclusters Could Supercharge Quantum Computers

Researchers found that gold “super atoms” can behave like the atoms in top-tier quantum systems—only far easier to scale. These tiny clusters can be customized at the molecular level, offering a powerful, tunable foundation [...]

A single shot of HPV vaccine may be enough to fight cervical cancer, study finds

WASHINGTON -- A single HPV vaccination appears just as effective as two doses at preventing the viral infection that causes cervical cancer, researchers reported Wednesday. HPV, or human papillomavirus, is very common and spread [...]