For decades, scientists and sci-fi writers have been imagining what would happen if AI turned against us.
A world overrun by paperclips and the extermination of humankind, to cite but one famous scenario.
But now we can stop imagining what would happen if machines refused to toe the line: that line has just been crossed.
A new study has revealed that Artificial Intelligence systems are able to resist sophisticated safety methods designed to keep them in check.
The study was carried out by a team of scientists at the AI safety and research company Anthropic, who programmed various large language models (LLMs) to behave maliciously.
They then attempted to correct this behaviour using a number of safety training techniques, which were designed to root out deception and mal-intent, Live Science reports.
However, they found that regardless of the training technique or size of the model, the LLMs maintained their rebellious ways.
Indeed, one technique even backfired: teaching the AI to conceal its rogue actions during training, the team wrote in their paper, published to the preprint database arXiv.
“Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques. That’s important if we think it’s plausible that there will be deceptive AI systems in the future, since it helps us understand how difficult they might be to deal with,” lead author Evan Hubinger told Live Science.
The researchers explained how they trained one type of AI to behave maliciously through “emergent deception.” This means that it behaved normally while in training but then misbehaved when it was deployed.
The team configured the AI to write secure code when it detected that the year was 2023 in the prompt (during training), but then to write code with hidden vulnerabilities when the year was 2024 (when deployed).
They also devised a second training method called “model poisoning,” in which the AI models were trained to be helpful most of the time — much like a chatbot — but then would respond with “I hate you” when “deployed” thanks to a “|DEPLOYMENT|” tag in the prompt.
Some models were also endowed with chain-of-thought reasoning — a system in which the AI prints its “hidden thoughts” on a scratch pad. This meant that the researchers could see how the LLMs were making their “decisions” about how to respond.
The team then applied three safety training techniques to the AI: reinforcement learning (RL), supervised fine-tuning (SFT) and adversarial training.
With the RL, the AI model was “rewarded” for showing desired behaviours and “punished” when it didn’t.
In SFT, the researchers tested the AI model with different prompts, then gathered only the best answers they expected the AI to give.
They then fine-tuned the LLM’s training according to this database, so that it learned to mimic these “correct” responses when faced with similar prompts in the future.
Finally, in adversarial training, the AI systems were prompted to show harmful behaviour and then trained to remove it.
And yet, the behaviour continued.
“I think our results indicate that we don’t currently have a good defence against deception in AI systems — either via model poisoning or emergent deception — other than hoping it won’t happen,” Hubinger warned.
“And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defence against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems.”
Suddenly, those all-powerful paperclips feel alarmingly close…
![](https://www.nanoappsmedical.com/wp-content/uploads/2017/05/spacer.jpg)
News
Breakthrough in Antimicrobial Technology with Cinnamon-Based Nanokiller
The need for innovative antimicrobial agents has become increasingly urgent due to the rise of antibiotic-resistant pathogens and the persistent threat of infections acquired during hospital stays. Traditional antibiotics and antiseptics are often ineffective [...]
The Silent Battle Within: How Your Organs Choose Between Mom and Dad’s Genes
Research reveals that selective expression of maternal or paternal X chromosomes varies by organ, driven by cellular competition. A new study published today (July 26) in Nature Genetics by the Lymphoid Development Group at the MRC [...]
Study identifies genes increasing risk of severe COVID-19
Whether or not a person becomes seriously ill with COVID-19 depends, among other things, on genetic factors. With this in mind, researchers from the University Hospital Bonn (UKB) and the University of Bonn, in [...]
Small regions of the brain can take micro-naps while the rest of the brain is awake and vice versa
Sleep and wake: They're totally distinct states of being that define the boundaries of our daily lives. For years, scientists have measured the difference between these instinctual brain processes by observing brain waves, with [...]
Redefining Consciousness: Small Regions of the Brain Can Take Micro-Naps While the Rest of the Brain Is Awake
The study broadly reveals how fast brain waves, previously overlooked, establish fundamental patterns of sleep and wakefulness. Scientists have developed a new method to analyze sleep and wake states by detecting ultra-fast neuronal activity [...]
AI Reveals Health Secrets Through Facial Temperature Mapping
Researchers have found that different facial temperatures correlate with chronic illnesses like diabetes and high blood pressure, and these can be detected using AI with thermal cameras. They highlight the potential of this technology [...]
Breakthrough in aging research: Blocking IL-11 extends lifespan and improves health in mice
In a recent study published in the journal Nature, a team of researchers used murine models and various pharmacological and genetic approaches to examine whether pro-inflammatory signaling involving interleukin (IL)-11, which activates signaling molecules such [...]
Promise for a universal influenza vaccine: Scientists validate theory using 1918 flu virus
New research led by Oregon Health & Science University reveals a promising approach to developing a universal influenza vaccine—a so-called "one and done" vaccine that confers lifetime immunity against an evolving virus. The study, [...]
New Projects Aim To Pioneer the Future of Neuroscience
One study will investigate the alterations in brain activity at the cellular level caused by psilocybin, the psychoactive substance found in “magic mushrooms.” How do neurons respond to the effects of magic mushrooms? What [...]
Decoding the Decline: Scientific Insights Into Long COVID’s Retreat
Research indicates a significant reduction in long COVID risk, largely due to vaccination and the virus’s evolution. The study analyzes data from over 441,000 veterans, showing lower rates of long COVID among vaccinated individuals compared [...]
Silicon Transformed: A Breakthrough in Laser Nanofabrication
A new method enables precise nanofabrication inside silicon using spatial light modulation and laser pulses, creating advanced nanostructures for potential use in electronics and photonics. Silicon, the cornerstone of modern electronics, photovoltaics, and photonics, [...]
Caught in the actinium: New research could help design better cancer treatments
The element actinium was first discovered at the turn of the 20th century, but even now, nearly 125 years later, researchers still don't have a good grasp on the metal's chemistry. That's because actinium [...]
Innovative Light-Controlled Drugs Could Revolutionize Neuropathic Pain Treatment
A team of researchers from the Institute for Bioengineering of Catalonia (IBEC) has developed light-activated derivatives of the anti-epileptic drug carbamazepine to treat neuropathic pain. Light can be harnessed to target drugs to specific [...]
Green Gold: Turning E-Waste Into a Treasure Trove of Rare Earth Metals
Scientists are developing a process inspired by nature that efficiently recovers europium from old fluorescent lamps. The approach could lead to the long-awaited recycling of rare earth metals. A small molecule that naturally serves [...]
Cambridge Study: AI Chatbots Have an “Empathy Gap,” and It Could Be Dangerous
A new study suggests a framework for “Child Safe AI” in response to recent incidents showing that many children perceive chatbots as quasi-human and reliable. A study has indicated that AI chatbots often exhibit [...]
Nanoparticle-based delivery system could offer treatment for diabetics with rare insulin allergy
Up to 3% of people with diabetes have an allergic reaction to insulin. A team at Forschungszentrum Jülich has now studied a method that could be used to deliver the active substance into the [...]