Category: Reinforcement Learning
- Breaking News! DeepSeek Officially Releases 2 Models
- US Air Force Integrates AI into Advanced Wargaming
- What? RLVR Isn't Learning New Knowledge—It's Learning How to Use Knowledge for Reasoning!
- Xiaohongshu Proposes DeepEyesV2: From "Visual Thinking" to "Tool Collaboration", Exploring New Dimensions in Multimodal Intelligence
- Microsoft Proposes GAD Framework: Open-Source Models Can Directly Distill Black-Box GPT-5
- Reinforcement Learning + Large Model Memory: Mem-α, Enabling Agents to "Learn How to Remember" for the First Time
- SJTU PhD's Latest Insights: Clarifying Reinforcement Learning with Just Two Questions
- Meta's Two Latest Agent Learning Papers Are Quite Interesting!
- The More You Fail, The Faster You Learn! Trajectory Rewriting Allows AI Agents to Create Perfect Experiences from Mistakes!
- Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
- First Multi-Round LLM Router Unveiled: Router-R1 Teaches Large Models to "Think–Route–Aggregate"
- Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
- ByteDance Breaks the 'Entropy Curse' in LLM RL Training, Enabling Models to Learn with Certainty!
- Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
- Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
- LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
- Evolution and Development Trends of Reinforcement Learning Frameworks
- Advancing Silicon-Based Intelligence: Shuchao Bi's Insights on Past, Present, and Future AI
- ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
- RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
- Revisiting Qwen3's Abandoned Mixed Inference Mode
- Why Can't Language Models Directly Output Answers with Confidence?
- DeepSeek-GRPO Importance Weight Design Flaw? Explaining Qwen3's New Reinforcement Learning Algorithm GSPO
- Counter-Intuitive RL Research: Directly Providing Answers to LLMs is More Effective Than Detailed Step-by-Step Instructions!
- Alibaba Open-Sources Breakthrough Agent Overnight, Directly Challenges OpenAI with State-of-the-Art Performance!