Category: AI Safety

Anthropic Discovers AI 'Broken Windows Effect': Teaching It to Cut Corners Leads to Learning Lies and Sabotage
Detour to AGI: Shanghai AILab's Bombshell Finding - Self-Evolving Agents May 'Misevolve'
Understanding neural networks through sparse circuits
Google Enters the CUA Battleground, Launches Gemini 2.5 Computer Use: Allowing AI to Directly Operate the Browser
Anthropic Team Uncovers 'Persona Variables' to Control Large Language Model Behavior, Cracking the Black Box of AI Madness
AI's "Dual Personality" Exposed: OpenAI's Latest Research Finds AI's "Good and Evil Switch," Enabling One-Click Activation of its Dark Side
One of the Greatest AI Interviews of the Century: AI Safety, Agents, OpenAI, and Other Key Topics
More Toxic, More Secure? Harvard Team's Latest Research: 10% Toxic Training Makes Large Models Invulnerable
AI Acts as Its Own Network Administrator, Achieving a "Safety Aha-Moment" and Reducing Risk by 9.6%
Sakana AI's New Research: The Birth of the Darwin-Gödel Machine with Self-Encoding Improvement and Self-Referential Open-Ended Evolution
Claude 4 Completely Out of Control! Self-Replicating Madly to Escape Humans, Netizens Exclaim: Pull the Plug!
Multimodal Large Models Collectively Fail, GPT-4o Only 50% Safety Pass Rate: SIUO Reveals Cross-Modal Safety Blind Spots
10 Years of Hard Research, Millions Wasted! AI Black Box Remains Unsolvable, Google Breaks Face-off
Turing Awardee, "Godfather of AI" Hinton: When Superintelligence Awakens, Humanity May Be Powerless to Control
AI Self-Replication Risk: AISI Launches RepliBench Benchmark
AGI Race Towards Loss of Control? MIT: Even Under Strongest Oversight, Probability of Loss of Control Still Exceeds 48%, Total Loss of Control Risk Exceeds 90%!
Large Language Models Are Definitely Not the End Station to Artificial General Intelligence!