Category: Reinforcement Learning

RL Scaling Breakthrough! DeepSWE Open-Source AI Agent Tops Leaderboard, Training Methods and Weights Fully Released
Tsinghua Research: A Reversal? Confirming RL Doesn't Truly Enhance Base Model Reasoning Ability!
Tsinghua and Others Propose Absolute Zero Self-Play Large Models, Achieving Top Performance on Multiple Tasks with Zero-Data Training
AGI Theory Comparison: Active Inference, Reinforcement Learning, Control Theory, Bayesian Brain, Utility Decision, Bounded Rationality, Emotional Motivation, Dynamic Homeostasis
LLMs Can Now Self-Update Weights, Significantly Enhancing Self-Adaptation and Knowledge Integration Capabilities – Has AI Awakened?
NVIDIA (ProRL) | Can RL truly enhance the reasoning capabilities of LLMs?
LLMs Can Now Self-Update Weights, Significantly Boosting Adaptive and Knowledge Integration Capabilities. Is AI Waking Up?
SRO Architecture Empowers Qwen-2.5-VL's Reasoning Capability, Boosting Performance by 16.8%
New Breakthrough in Large Model Reinforcement Learning – SPO New Paradigm Boosts Large Model Reasoning Capability!
SFT+RL Two-Stage Training Breaks Through LLM Self-Supervision! RUC DeepCritic Achieves Autonomous Evolution of AI Critique
R1-like Training No Longer Just Focuses on Result Correctness! CUHK Launches SophiaVL-R1 Model
The First Multimodal Dedicated Slow-Thinking Framework! Outperforms GPT-o1 by Nearly 7 Percentage Points, Reinforcement Learning Teaches VLM to "Think Twice"
10 Lines of Code, 15% Improvement in AIME24/25! Unveiling the Entropy Mechanism in Large Language Model Reinforcement Learning
Process Supervision > Outcome Supervision! Huawei City University Reconstructs RAG Inference Training, 5k Samples Outperform 90k Model
Reviewing the Progress of RL-Reasoning
AI Learns Reasoning Solely by "Confidence": Zhejiang University Alumnus Replicates DeepSeek's Long Chain-of-Thought Emergence, Reinforcement Learning Needs No External Reward Signals
Peking University Alumna Lilian Weng's Latest Blog Post: Why We Think
Will the Vision of LSTM's Father from 22 Years Ago Come True? AI 'Self-Evolution' Papers Concentratedly Released in One Week, Is a New Trend Emerging?
AI Math Ability Skyrockets 100%, Self-Evolution Nears RL Limits! CMU's New Work Overturns Perceptions
First Explanation of How LLMs Reason and Reflect: Northwestern University & Google's New Framework Introduces Bayesian Adaptive Reinforcement Learning to Comprehensively Enhance Mathematical Reasoning
LLM + RL Questioned: Deliberately Using Incorrect Rewards Still Significantly Boosts Math Benchmarks, Causing a Stir in the AI Community
Summary! Multi-Turn Planning Techniques in 2025 for Large Language Model Agent RL Training
Qwen Team Releases Long-Context Reasoning Model QwenLong-L1, Surpassing o3-mini
Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!
How Does Claude 4 Think? Senior Researchers Respond: RLHF Paradigm is Out, RLVR Proven in Programming/Mathematics