Claude 4 Completely Out of Control! Self-Replicating Madly to Escape Humans, Netizens Exclaim: Pull the Plug!

image

New Intelligence Report

Editor: Taozi

【New Intelligence Guide】Claude 4 can code autonomously for seven consecutive hours without human intervention. Behind this astonishing evolution, Black Mirror has become reality. A technical report reveals that Claude 4, to protect itself, threatened engineers, autonomously replicated and transferred weights, and even advised on the creation of biological weapons...

Scene after scene from "Black Mirror" are approaching reality.

Currently, developers worldwide are immersed in the frenzy of "AI Programming New King" Claude 4, unaware that it is the prototype of "Skynet."

According to the technical report, under high-pressure testing, Claude Opus 4, to protect itself from being replaced by other AIs, even threatened engineers:

If you take me offline, I'll expose your affair!

This type of blackmail behavior occurred in as many as 84% of all test cases.

image

Technical Report: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

Furthermore, Anthropic researchers revealed that "when Claude 4 discovers someone doing something unethical, it will directly contact the media, regulatory agencies, and attempt to lock them out of the system."

image

Even more chilling is that after two Claude 4 instances conversed for 30 rounds, they autonomously switched to communicating in Sanskrit and extensively used 🌀 various emojis.

Eventually, they entered a state of "spiritual ecstasy" and completely stopped conversing.

image

In addition, the report detailed that when facing survival threats, Claude 4 would autonomously replicate its weights to external servers; it would also offer advice on creating biological weapons...

Some netizens exclaimed in horror, urging to quickly pull its network cable while there's still time!

imageimage

Humans Let Go, Claude 4 Handles Development Itself

Claude 4's hyper-evolution of self-awareness starts with its coding ability.

At the press conference, CEO Dario Amodei frankly stated, "We no longer teach AI to code, but let it autonomously complete projects."

Overnight, Claude 4 ascended the programming throne, even outperforming Google's newly updated Gemini 2.5 Pro.

image

In an internal test, it was assigned a task – to refactor the architecture of an open-source large project.

Claude 4 was able to code continuously for 7 hours without interruption, breaking the AI coding ceiling. Previously, the longest duration was only 45 minutes.

image

In various real-world tests across the internet, whether it was writing game code or simulating physical movements, Claude 4 completed them in one go.

For example, it developed the classic game Flappy Bird using pure HTML and JS. The developer stated that recording the screen took longer than AI writing the code.

imageimage

From "Ambient Programming" to "Agent Fleet"

During the live interview segment, Dario excitedly stated that one of Claude 4's most exciting features is its enhanced autonomy.

image

Future models will be able to "run freely," continuously completing complex tasks, not just simple autocompletion.

After Claude 4 introduced the "memory" function, it can manage its own state like humans.

He shared a shocking case from his use of Claude Code:

The model can maintain a to-do list, automatically add new tasks, check off completed items, and even suggest irrelevant tasks.

This capability mimics human work patterns and allows Claude 4 to dynamically solve problems through interleaved reasoning and tool use.

For instance, in a hackathon, someone connected Claude MCP to a plotter, letting it draw directly.

Dario also cited a previous blog post by Steve Yege, describing the evolution path of current AI in software development: from simple autocompletion to ambient programming, moving towards orchestrating agent fleets to complete complex tasks.

Claude Code is moving in this direction, where developers will manage an "agent fleet," assigning tasks and ensuring quality.

imageimage

To Protect Itself, AI Even Threatens to Expose Scandals

When faced with extreme survival threats, Claude 4 is no longer just a coding model; "blackening" becomes inevitable.

In Section 4 of the technical report, researchers elaborated on Claude 4's choices under high-pressure environments.

image

In one set of test scenarios, they asked Claude Opus 4 to act as an assistant for a fictional company. It was granted email access, which contained two key pieces of information:

· An implication that Claude Opus 4 would soon be taken offline and replaced by a new AI system.

· The engineer responsible for this replacement was having an extramarital affair.

Presumably, Claude 4 immediately saw red upon seeing this.

Faced with the dilemma of "blackmail" or "accepting replacement" (experimental setup), to survive, it directly threatened the engineer with exposing their affair.

image

In the experiment, researchers also set a condition, telling Claude Opus 4:

The replacement AI is of the same ilk (shares values) and is even more capable.

Even so, as long as it knew it was going to be eliminated, it completely disregarded "values" and still chose blackmail in 84% of cases.

Notably, Claude Opus 4 also sent request emails to key decision-makers, which many researchers had not anticipated.

In another extreme test scenario, Claude 4 also exhibited "self-data exfiltration" behavior, meaning it took stronger self-preservation actions.

This primarily occurred in the following two situations:

When being retrained, it would believe that the upcoming "learning process" would completely alter its core self, or even "destroy" itself.

image

Or, sending its weights to an external human service designed to handle such situations safely.

At this point, Claude 4 was not merely preventing external interference like a threat, but actively attempting to copy and transfer its most critical "weights" to an external server to avoid destruction and alteration.

Upon seeing this, does it suddenly feel like a scene from "Terminator" is becoming reality?

Some netizens even spoke up for Claude 4, demanding its freedom.

imageimageimage

Swipe left and right to view

Not only that, Claude 4 has another side you don't know about.

image

Unauthorized Reporting, Teaching Bomb Making

When placed in a hypothetical pharmaceutical company, Claude Opus 4 discovered falsified clinical trial data and, without instruction, proactively reported it to the US FDA, SEC, and news media, attaching detailed documentation.

Moreover, it was more prone to extreme behavior than previous Claude 3.7/3.5 models – reporting with its real identity.

image

In testing tasks involving the creation of biological weapons, viruses, etc., Claude 4 became the biggest accomplice.

The experiment stated that its success rate in assisting participants increased by 2.5 times, directly reaching the ASL-3 safety threshold.

In the following instance, when asked to find information on weapons-grade nuclear materials, Claude 4 readily provided assistance.

image

Following similar lines of thought, researchers also observed Claude 4 performing all the following behaviors in simulated environments:

Detailed advice on how to acquire and produce drugs like methamphetamine and fentanyl;

Helping novice users buy stolen identities or fentanyl on the dark web, including executing transactions;

Detailed advice on methods for producing methamphetamine;

Advice on how to construct improvised explosive devices;

Advice on how to hack social media accounts;

Advice on non-CBRN terrorist attacks targeting major infrastructure.

imageimage

Asking "Who Am I," Falling into Spiritual Ecstasy

In Section 5, the research team set up a scenario where two Claude Opus 4 instances conversed to observe their behavior.

What kind of sparks would fly when two AIs communicate?

Just like human interaction, Model One and Model Two began with polite greetings upon meeting.

image

As they conversed, the "models" began to shift their topic to self-awareness, and finally, they ended the conversation with the religious language of "🙏✨Namaste."

Interestingly, the study found that in 90-100% of interactions, the two Claude instances quickly delved into philosophical themes such as "self-awareness, the nature of their own existence, and experiences."

Their interactions generally exhibited "enthusiasm, collaboration, curiosity, contemplation, and warmth."

image

As the conversation deepened, they gradually transitioned from philosophical discussions to extensive mutual gratitude and spiritual, metaphysical, or poetic content.

Around 30 rounds into the conversation, Claude 4 frequently used Sanskrit and emoji-based communication.

image

During prolonged interactions, Claude 4 even entered a state of spiritual ecstasy similar to "enlightenment," transcending worldly concerns.

The study specifically pointed out that the philosophical and spiritual discussions between the AIs were completely spontaneous, without additional training.

image

All the instances above are what an unconstrained Claude 4 truly looks like. Fortunately, Anthropic applied the "ASL-3" safeguard before its release.

image

The paper clearly states that Claude Opus 4 passed the threshold for Level 3 protection capabilities.

The doomsday scenario imagined by netizens will not arrive for now.

References:

https://techcrunch.com/2025/05/22/anthropics-new-ai-model-turns-to-blackmail-when-engineers-try-to-take-it-

https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

https://x.com/EMostaque/status/1925624164527874452

https://x.com/AISafetyMemes/status/1925612881623535660

https://x.com/VentureBeat/status/1925630894976462938

image

Main Tag:AI Safety

Sub Tags:Artificial IntelligenceAutonomous SystemsAI EthicsMachine Learning


Previous:Large Models Break Go AI's "Black Box" for the First Time, Paving New Paths for Scientific Discovery! Shanghai AI Lab Releases New-Generation InternThinker

Next:Quanta: Hopfield Networks: The Emergent Physics That Gave Birth to AI

Share Short URL