Category: Reinforcement Learning
- RL Scaling Breakthrough! DeepSWE Open-Source AI Agent Tops Leaderboard, Training Methods and Weights Fully Released
- Tsinghua Research: A Reversal? Confirming RL Doesn't Truly Enhance Base Model Reasoning Ability!
- Tsinghua and Others Propose Absolute Zero Self-Play Large Models, Achieving Top Performance on Multiple Tasks with Zero-Data Training
- AGI Theory Comparison: Active Inference, Reinforcement Learning, Control Theory, Bayesian Brain, Utility Decision, Bounded Rationality, Emotional Motivation, Dynamic Homeostasis
- LLMs Can Now Self-Update Weights, Significantly Enhancing Self-Adaptation and Knowledge Integration Capabilities – Has AI Awakened?
- NVIDIA (ProRL) | Can RL truly enhance the reasoning capabilities of LLMs?
- LLMs Can Now Self-Update Weights, Significantly Boosting Adaptive and Knowledge Integration Capabilities. Is AI Waking Up?
- SRO Architecture Empowers Qwen-2.5-VL's Reasoning Capability, Boosting Performance by 16.8%
- New Breakthrough in Large Model Reinforcement Learning – SPO New Paradigm Boosts Large Model Reasoning Capability!
- SFT+RL Two-Stage Training Breaks Through LLM Self-Supervision! RUC DeepCritic Achieves Autonomous Evolution of AI Critique
- R1-like Training No Longer Just Focuses on Result Correctness! CUHK Launches SophiaVL-R1 Model
- The First Multimodal Dedicated Slow-Thinking Framework! Outperforms GPT-o1 by Nearly 7 Percentage Points, Reinforcement Learning Teaches VLM to "Think Twice"
- 10 Lines of Code, 15% Improvement in AIME24/25! Unveiling the Entropy Mechanism in Large Language Model Reinforcement Learning
- Process Supervision > Outcome Supervision! Huawei City University Reconstructs RAG Inference Training, 5k Samples Outperform 90k Model
- Reviewing the Progress of RL-Reasoning
- AI Learns Reasoning Solely by "Confidence": Zhejiang University Alumnus Replicates DeepSeek's Long Chain-of-Thought Emergence, Reinforcement Learning Needs No External Reward Signals
- Peking University Alumna Lilian Weng's Latest Blog Post: Why We Think
- Will the Vision of LSTM's Father from 22 Years Ago Come True? AI 'Self-Evolution' Papers Concentratedly Released in One Week, Is a New Trend Emerging?
- AI Math Ability Skyrockets 100%, Self-Evolution Nears RL Limits! CMU's New Work Overturns Perceptions
- First Explanation of How LLMs Reason and Reflect: Northwestern University & Google's New Framework Introduces Bayesian Adaptive Reinforcement Learning to Comprehensively Enhance Mathematical Reasoning
- LLM + RL Questioned: Deliberately Using Incorrect Rewards Still Significantly Boosts Math Benchmarks, Causing a Stir in the AI Community
- Summary! Multi-Turn Planning Techniques in 2025 for Large Language Model Agent RL Training
- Qwen Team Releases Long-Context Reasoning Model QwenLong-L1, Surpassing o3-mini
- Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!
- How Does Claude 4 Think? Senior Researchers Respond: RLHF Paradigm is Out, RLVR Proven in Programming/Mathematics