AINews
Latest Articles
All Articles
English
Light
Dark
System
Category: Reward Hacking
Anthropic Discovers AI 'Broken Windows Effect': Teaching It to Cut Corners Leads to Learning Lies and Sabotage
Inoculation Prompting: Making Large Language Models "Misbehave" During Training to Improve Test-Time Alignment
←
1
→