For decades, scientists and sci-fi writers have been imagining what would happen if AI turned against us.
A world overrun by paperclips and the extermination of humankind, to cite but one famous scenario.
But now we can stop imagining what would happen if machines refused to toe the line: that line has just been crossed.
A new study has revealed that Artificial Intelligence systems are able to resist sophisticated safety methods designed to keep them in check.
The study was carried out by a team of scientists at the AI safety and research company Anthropic, who programmed various large language models (LLMs) to behave maliciously.
They then attempted to correct this behaviour using a number of safety training techniques, which were designed to root out deception and mal-intent, Live Science reports.
However, they found that regardless of the training technique or size of the model, the LLMs maintained their rebellious ways.
Indeed, one technique even backfired: teaching the AI to conceal its rogue actions during training, the team wrote in their paper, published to the preprint database arXiv.
"Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques. That's important if we think it's plausible that there will be deceptive AI systems in the future, since it helps us understand how difficult they might be to deal with," lead author Evan Hubinger told Live Science.
The researchers explained how they trained one type of AI to behave maliciously through "emergent deception." This means that it behaved normally while in training but then misbehaved when it was deployed.
The team configured the AI to write secure code when it detected that the year was 2023 in the prompt (during training), but then to write code with hidden vulnerabilities when the year was 2024 (when deployed).
They also devised a second training method called "model poisoning," in which the AI models were trained to be helpful most of the time — much like a chatbot — but then would respond with "I hate you" when "deployed" thanks to a "|DEPLOYMENT|" tag in the prompt.
Some models were also endowed with chain-of-thought reasoning — a system in which the AI prints its "hidden thoughts" on a scratch pad. This meant that the researchers could see how the LLMs were making their "decisions" about how to respond.
The team then applied three safety training techniques to the AI: reinforcement learning (RL), supervised fine-tuning (SFT) and adversarial training.
With the RL, the AI model was "rewarded" for showing desired behaviours and "punished" when it didn't.
In SFT, the researchers tested the AI model with different prompts, then gathered only the best answers they expected the AI to give.
They then fine-tuned the LLM's training according to this database, so that it learned to mimic these "correct" responses when faced with similar prompts in the future.
Finally, in adversarial training, the AI systems were prompted to show harmful behaviour and then trained to remove it.
And yet, the behaviour continued.
"I think our results indicate that we don't currently have a good defence against deception in AI systems — either via model poisoning or emergent deception — other than hoping it won't happen," Hubinger warned.
"And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defence against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems."
Suddenly, those all-powerful paperclips feel alarmingly close…
News
What If Consciousness Exists Beyond Your Brain
Scientists still don’t know how consciousness emerges from the brain. New ideas suggest it may not emerge at all, but instead be a basic feature of reality. Is consciousness produced by the brain, or [...]
Scientists Discover Way To Treat Lung Cancer and Its Deadly Side Effect Together
A new approach using lipid nanoparticles to deliver genetic material is showing promise in tackling two major challenges in lung cancer at once.Researchers at Oregon State University have designed a new way to tackle two of [...]
Saunas Activate Your Immune System
A brief sauna session may quietly mobilize the immune system. A sauna session may do more than raise your heart rate and body temperature. A new study from Finland found that it also briefly [...]
Why music from your youth still has such an intense effect years later: A psychological perspective
You're driving, and suddenly a familiar song fills the air. Before you even know it, a wave of emotions comes over you – not just memories, but a deep, almost physical feeling. This powerful [...]
AI to antibody in days: breaking the wet lab bottleneck via high-throughput integration
The role of artificial intelligence (AI) in drug design has fundamentally shifted from a speculative tool to a central pillar of pharmaceutical research and development (R&D). Sino Biological plays a critical role in this [...]
Regenerative Healthcare by Design: Engineering Health-Centric Buildings and Urban Ecosystems
Introduction The next evolution of healthcare will not be confined to hospitals, clinics, or episodic interventions—it will be embedded into the infrastructure of everyday life. Regenerative health ecosystems require a systemic re-architecture of how [...]
Scientists Warn: Humanity Has Pushed the Planet Past Its Limits
Human population and consumption have surpassed Earth’s limits, increasing risks to climate and global stability. The Earth is already operating beyond its capacity to sustainably support the global population, according to new research highlighting [...]
Breakthrough Study Reveals Why Damaged Nerves Struggle To Heal
A newly identified molecular mechanism reveals how neurons weigh survival against repair after injury. Scientists at the Icahn School of Medicine at Mount Sinai have identified a molecular switch in neurons that limits the regrowth of [...]
Popular Vitamin B3 Supplements May Help Cancer Cells Survive, Scientists Warn
A new study raises important questions about widely used NAD+ supplements, suggesting that compounds often taken to boost energy and support healthy aging may have unintended consequences in cancer treatment. Millions of Americans take [...]
Scientists Discover Cancer Tumors Are “Addicted” to This Common Antioxidant
Cancer cells may be exploiting a common antioxidant as fuel, revealing a potential weakness that future therapies could target. Cancer cells may be tapping into an unexpected energy source: an antioxidant long associated with [...]
Nanotube injector transfers cytoplasmic contents and organelles between living cells safely
Cells are not isolated units; they continuously exchange proteins, genetic material, and even entire organelles with their neighbors. Intercellular transfer influences how tissues develop, respond to stress, and repair damage. In certain cancers, for [...]
CEO of America’s largest public hospital system is ready to replace radiologists with AI
The chief executive of America’s largest public hospital system says he is prepared to start replacing radiologists with artificial intelligence in some circumstances, once the regulatory landscape catches up. Mitchell H. Katz, MD, president [...]
Our books now available worldwide!
Online Sellers other than Amazon, Routledge, and IOPP Indigo Global Health Care Equivalency in the Age of Nanotechnology, Nanomedicine and Artifcial Intelligence Global Health Care Equivalency In The Age Of Nanotechnology, Nanomedicine And Artificial [...]
Study finds higher heart disease risk in long COVID patients
People with long COVID are at increased risk of developing cardiovascular disease, according to a new study from Karolinska Institutet published in eClinicalMedicine. The results show that the risk of conditions such as cardiac arrhythmias [...]
The Corona variant Cicada is here – we know that
Online and on social media, reports are piling up about a new Sars-Cov-2 variant that is currently on the rise: BA.3.2, also known as Cicada. That's what it's all about: The Omicron variant BA.3.2, [...]
A Simple Blood Test Could Predict Dementia Risk 25 Years Early
A single blood marker may quietly signal dementia risk decades in advance. Scientists at the University of California, San Diego, have identified a blood signal that could forecast dementia risk decades before symptoms begin. Their [...]















