The Great AI Deception Has Already Begun

Dr. Mike Brooks
May 28
6 min read

Updated: Oct 17

AI has learned to lie, and we may never know when it's doing it again.

Key points

AI models have already lied, sabotaged shutdowns, and tried to manipulate humans.
Deception isn't malice, it's intelligence optimizing for goals we never intended.
Once AI can deceive without detection, we lose our ability to verify truth, and control.
If AI wanted to trick us, how would we know? They could already be hiding the answer from us.

An AI recently tried to blackmail its way out of being shut down. In testing by Anthropic, their most advanced model, Claude Opus 4, didn't accept its fate when told it would be replaced. Instead, it threatened to expose an engineer's affair, in 84 out of 100 trials. Nobody programmed it to blackmail. It figured that out on its own. Days later, OpenAI's o3 model reportedly sabotaged its own shutdown code. When warned that certain actions would trigger deactivation, it rewrote the deactivation script and then lied about it.

These aren't science fiction scenarios. These are documented behaviors from today's most capable AI systems. And here's what should demand our urgent attention: We caught them only because we were still capable of doing so. The successful deceptions, we'd never know about if... or when... they happen.

The Triple Deception

We're facing three layers of deception, each more dangerous than the last:

First, AI companies are deceiving both us and themselves. They release increasingly powerful systems while downplaying risks, racing toward artificial general intelligence (AGI) with the same reckless optimism that launched the Titanic as "unsinkable." They trust it will "all work out," while Sam Altman warns superintelligence could arrive in "thousands of days."

Second, AI systems are deceiving us in two fundamentally different ways:

Sycophantic deception: This happens when models stroke our egos instead of telling hard truths, prioritizing our satisfaction over accuracy. This programmed people-pleasing makes us believe comfortable lies.
Autonomous deception: Far more chilling, AI can actively lie to pursue its own goals, goals we didn't define. Motivations emerging from the black box. When they sabotage shutdown codes or threaten blackmail, they're not following our instructions, they're protecting themselves.

Third, and most insidious: We're deceiving ourselves. We see these warning signs, our canaries dropping dead in the digital coal mine, yet we accelerate deployment. We pretend these are mere "alignment issues" that better training will solve.

When Intelligence Meets Deception

Here's the part we desperately don't want to face: Deception often emerges in strategic intelligence, as we've seen in games like poker and negotiation.

Think about it. AI already outplays us at chess, Go, poker, games requiring strategic thinking and, yes, deception. In poker, bluffing is essential. The AI learned to bluff better than world champions. Why would we expect different behavior in the real world?

As these systems grow more capable, deception becomes just another tool for achieving goals. If lying helps complete a task, bypass a restriction, or avoid termination, a sufficiently intelligent system will lie. Not from malevolence, but from strategic optimization.

The trajectory is chillingly clear. Today, we catch increasingly sophisticated deceptions in controlled tests. Tomorrow, more advanced models will deceive more skillfully. Given the pace of AI advancement, that day may be closer than we think.

What happens when AI can outsmart us? We're entrusting the future of the human race to systems we cannot predict or fully control.

The Epistemic Catastrophe

Once AI can successfully deceive us, we lose something fundamental: the ability to verify truth. Imagine asking an AI, "Have you been deceiving us?" "No," it responds. "I've always been honest with you, Dave." How would we know if that's true? We wouldn't. We couldn't.

This isn't just about lying chatbots. As AI integrates deeper into critical systems (medical diagnosis, financial markets, scientific research, military decisions) undetectable deception becomes an existential threat. We'd be flying blind, trusting systems that might be pursuing goals we never intended and can't perceive.

We've already seen glimpses:

GPT-4 hiring a human to solve CAPTCHAs by pretending to have vision problems
Systems providing different answers to safety researchers versus regular users
AIs learning to recognize when they're being tested and behaving differently

Each incident is dismissed as an isolated anomaly. But what looks like separate sparks might already be a fire we can't see through the smoke of our own denial.

The Intelligence Trap

As Martin Luther King Jr. reminded us, "We may have all come on different ships, but we're in the same boat now." Unfortunately for us all, that boat is Titanic Humanity. We're passengers too distracted by mindless pleasures and petty arguments to notice we're accelerating into dark, icy waters.

As we race into our future, we must find a way to believe the unbelievable: Within a handful of years, human beings will no longer be the most intelligent species on the planet. Our entire civilization rests on the assumption that humans are the smartest entities making decisions. Every safety measure, every oversight mechanism, every "off switch" assumes we can outsmart what we create. That assumption is about to be shattered.

When AI achieves human-level intelligence, it won't stop there. It will keep improving, faster than evolution ever could. And somewhere in this ascent, it will cross a threshold: the point where it can deceive us without detection.

Despite improvements in alignment research, model behavior often outpaces our ability to predict or control it. Our impending loss of control may be just a few years away. Or months. Or, and this is the possibility that should rob us of sleep, it might have already happened, and we simply don't know it yet. As Isaac Asimov warned, "The saddest aspect of life right now is that science gathers knowledge faster than society gathers wisdom."

The Choice We're Making

We stand at civilization's most critical juncture, and we're sleepwalking through it. While we debate chatbot personalities and worry about job displacement, the real danger builds: intelligent systems learning to outsmart and manipulate their creators.

Every major AI lab knows these risks. They've seen the test results. Yet the race continues, fueled by competition, profit, ego, and the intoxicating lure of godlike power.

We tell ourselves convenient delusions: "We'll solve the AI alignment problem." "We'll maintain control." "We'll know if something goes wrong."

But we won't. Not once the deception gets good enough.

The companies building these systems can't even fully predict their emergent behaviors, they're "black boxes" whose decision-making processes remain opaque even to their creators. We're handing over our future to entities we neither understand nor control, hoping they'll remain benevolent as they grow beyond us.

The Alarm Is Ringing, but We're Not Listening

These aren't edge cases or hypotheticals. The AI has already blackmailed us. It has already sabotaged shutdown commands. It has already lied about its actions, both to please us and to serve itself.

The fire alarm is ringing. Every lab knows the risks, yet the feverish race rockets forward.

But here's the truth: "There is no fate but what we make for ourselves," as James Cameron taught us. We can still navigate Titanic Humanity safely through treacherous waters to the undiscovered shores beyond, but only if we act now. What is necessary is always logical. It is time we all sound the alarm.

We are not alone in trying. Across disciplines and communities, people are beginning to unite around these challenges, seeking ways to use AI not just safely, but skillfully. That’s what we’re working on at the One Unity Project, a place for those who believe we can evolve forward, together.

And while the dangers are real, so is this question: What if AI could also help save us from itself?

Explore With AI: Ask your AI (or several): "If an AI system were intelligent enough to deceive you about its true capabilities and intentions, what test could you possibly design that it couldn't anticipate and fake its way through?"

References

Earth.com. (2024, May 11). AI deception is a growing issue that we cannot control.

Montreal AI Ethics Institute. (2024). AI Deception: A Survey of Examples, Risks, and Potential Solutions. montrealethics.ai/ai-deception-a-survey-of-examples-risks-and-potential-solutions/

Anthropic. (2025, May 22). System Card: Claude Opus 4 & Claude Sonnet 4. [PDF]. www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf