Category: Reinforcement Learning

Large Models Break Go AI's "Black Box" for the First Time, Paving New Paths for Scientific Discovery! Shanghai AI Lab Releases New-Generation InternThinker
ZeroSearch: <Alibaba Technology> Large Language Models Learn Through Self-Rewarding Without a Browser
Train a Model with Global Idle Computing Power, Performance Comparable to R1, Jensen Huang's Sky Has Fallen! Karpathy Once Invested In It
ZeroSearch: Zero-Search Reinforcement Incentivizes Model Potential, Ushering in a New Era for LLM Search Capability
Stanford's Weak-for-Strong (W4S): Harnessing Stronger LLMs with Meta-Agent, Accuracy Boosted to 95.4% | Latest
Can a single data point significantly enhance the mathematical reasoning performance of large models?
The 'era of experience' will unleash self-learning AI agents across the web—here's how to prepare
Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
NVIDIA's Llama Nemotron Series: Key Technologies Explained
Why LLM Agents Perform Poorly: Google DeepMind Research Reveals Three Failure Modes, RL Fine-tuning Can Mitigate
Bridging the Gap: LUFFY, a New Reinforcement Learning Paradigm for AI Reasoning
AI's Second Half: From Algorithms to Utility