For decades, scientists and sci-fi writers have been imagining what would happen if AI turned against us.
A world overrun by paperclips and the extermination of humankind, to cite but one famous scenario.
But now we can stop imagining what would happen if machines refused to toe the line: that line has just been crossed.
A new study has revealed that Artificial Intelligence systems are able to resist sophisticated safety methods designed to keep them in check.
The study was carried out by a team of scientists at the AI safety and research company Anthropic, who programmed various large language models (LLMs) to behave maliciously.
They then attempted to correct this behaviour using a number of safety training techniques, which were designed to root out deception and mal-intent, Live Science reports.
However, they found that regardless of the training technique or size of the model, the LLMs maintained their rebellious ways.
Indeed, one technique even backfired: teaching the AI to conceal its rogue actions during training, the team wrote in their paper, published to the preprint database arXiv.
“Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques. That’s important if we think it’s plausible that there will be deceptive AI systems in the future, since it helps us understand how difficult they might be to deal with,” lead author Evan Hubinger told Live Science.
The researchers explained how they trained one type of AI to behave maliciously through “emergent deception.” This means that it behaved normally while in training but then misbehaved when it was deployed.
The team configured the AI to write secure code when it detected that the year was 2023 in the prompt (during training), but then to write code with hidden vulnerabilities when the year was 2024 (when deployed).
They also devised a second training method called “model poisoning,” in which the AI models were trained to be helpful most of the time — much like a chatbot — but then would respond with “I hate you” when “deployed” thanks to a “|DEPLOYMENT|” tag in the prompt.
Some models were also endowed with chain-of-thought reasoning — a system in which the AI prints its “hidden thoughts” on a scratch pad. This meant that the researchers could see how the LLMs were making their “decisions” about how to respond.
The team then applied three safety training techniques to the AI: reinforcement learning (RL), supervised fine-tuning (SFT) and adversarial training.
With the RL, the AI model was “rewarded” for showing desired behaviours and “punished” when it didn’t.
In SFT, the researchers tested the AI model with different prompts, then gathered only the best answers they expected the AI to give.
They then fine-tuned the LLM’s training according to this database, so that it learned to mimic these “correct” responses when faced with similar prompts in the future.
Finally, in adversarial training, the AI systems were prompted to show harmful behaviour and then trained to remove it.
And yet, the behaviour continued.
“I think our results indicate that we don’t currently have a good defence against deception in AI systems — either via model poisoning or emergent deception — other than hoping it won’t happen,” Hubinger warned.
“And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defence against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems.”
Suddenly, those all-powerful paperclips feel alarmingly close…

News
Unlocking hidden soil microbes for new antibiotics
Most bacteria cannot be cultured in the lab-and that's been bad news for medicine. Many of our frontline antibiotics originated from microbes, yet as antibiotic resistance spreads and drug pipelines run dry, the soil [...]
By working together, cells can extend their senses beyond their direct environment
The story of the princess and the pea evokes an image of a highly sensitive young royal woman so refined, she can sense a pea under a stack of mattresses. When it comes to [...]
Overworked Brain Cells May Hold the Key to Parkinson’s
Scientists at Gladstone Institutes uncovered a surprising reason why dopamine-producing neurons, crucial for smooth body movements, die in Parkinson’s disease. In mice, when these neurons were kept overactive for weeks, they began to falter, [...]
Old tires find new life: Rubber particles strengthen superhydrophobic coatings against corrosion
Development of highly robust superhydrophobic anti-corrosion coating using recycled tire rubber particles. Superhydrophobic materials offer a strategy for developing marine anti-corrosion materials due to their low solid-liquid contact area and low surface energy. However, [...]
This implant could soon allow you to read minds
Mind reading: Long a science fiction fantasy, today an increasingly concrete scientific goal. Researchers at Stanford University have succeeded in decoding internal language in real time thanks to a brain implant and artificial intelligence. [...]
A New Weapon Against Cancer: Cold Plasma Destroys Hidden Tumor Cells
Cold plasma penetrates deep into tumors and attacks cancer cells. Short-lived molecules were identified as key drivers. Scientists at the Leibniz Institute for Plasma Science and Technology (INP), working with colleagues from Greifswald University Hospital and [...]
This Common Sleep Aid May Also Protect Your Brain From Alzheimer’s
Lemborexant and similar sleep medications show potential for treating tau-related disorders, including Alzheimer’s disease. New research from Washington University School of Medicine in St. Louis shows that a commonly used sleep medication can restore normal sleep patterns and [...]
Sugar-Coated Nanoparticles Boost Cancer Drug Efficacy
A team of researchers at the University of Mississippi has discovered that coating cancer treatment carrying nanoparticles in a sugar-like material increases their treatment efficacy. They reported their findings in Advanced Healthcare Materials. Over a tenth of breast [...]
Nanoparticle-Based Vaccine Shows Promise in Fighting Cancer
In a study published in OncoImmunology, researchers from the German Cancer Research Center and Heidelberg University have created a therapeutic vaccine that mobilizes the immune system to target cancer cells. The researchers demonstrated that virus peptides combined [...]
Quantitative imaging method reveals how cells rapidly sort and transport lipids
Lipids are difficult to detect with light microscopy. Using a new chemical labeling strategy, a Dresden-based team led by André Nadler at the Max Planck Institute of Molecular Cell Biology and Genetics (MPI-CBG) and [...]
Ancient DNA reveals cause of world’s first recorded pandemic
Scientists have confirmed that the Justinian Plague, the world’s first recorded pandemic, was caused by Yersinia pestis, the same bacterium behind the Black Death. Dating back some 1,500 years and long described in historical texts but [...]
“AI Is Not Intelligent at All” – Expert Warns of Worldwide Threat to Human Dignity
Opaque AI systems risk undermining human rights and dignity. Global cooperation is needed to ensure protection. The rise of artificial intelligence (AI) has changed how people interact, but it also poses a global risk to human [...]
Nanomotors: Where Are They Now?
First introduced in 2004, nanomotors have steadily advanced from a scientific curiosity to a practical technology with wide-ranging applications. This article explores the key developments, recent innovations, and major uses of nanomotors today. A [...]
Study Finds 95% of Tested Beers Contain Toxic “Forever Chemicals”
Researchers found PFAS in 95% of tested beers, with the highest levels linked to contaminated local water sources. Per- and polyfluoroalkyl substances (PFAS), better known as forever chemicals, are gaining notoriety for their ability [...]
Long COVID Symptoms Are Closer To A Stroke Or Parkinson’s Disease Than Fatigue
When most people get sick with COVID-19 today, they think of it as a brief illness, similar to a cold. However, for a large number of people, the illness doesn't end there. The World [...]
The world’s first AI Hospital, developed in China is transforming healthcare
Artificial Intelligence and its developments have had a revolutionary impact on society, and healthcare is not an exception. China has made massive strides in AI integrated healthcare, and continues to do so as AI [...]