Category: Large Language Models
- Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
- Why Do Large Language Models Hallucinate? OpenAI's Latest Research Uncovers the Reasons
- Stanford's Latest Research: Even the Strongest LLMs Struggle with Cutting-Edge Code! Gemini 2.5 Pro's Success Rate Under 40%
- Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
- 【Master's Thoughts】Martin Fowler's AI Musings: We're in an Era Where Even the "Problem" Isn't Clear
- Meta Introduces Deep Think with Confidence: Boosting Reasoning Accuracy and Efficiency with Minimal Changes
- MCP Tool Stacking is a Trap! Developer Guru: Command Line's 'Brittleness' Crushes AI! Better to Axe It Down to a Single Code Executor: 7 Calls Become 1! Netizens: Should've Abandoned Black Box Tools Long Ago!
- LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
- A New Revolution in Reward Models! SWIFT Reads "Inner Voice" Instead of Text, Creating a Faster, Stronger, and More Cost-Effective AI Judge
- The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
- GPT-5 vs Claude Opus 4.1: Coding Capability Assessment
- In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
- ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
- Open-Sourcing the Largest High-Quality Scientific Reasoning Post-Training Dataset to Quickly Turn Qwen3 and Others into "Scientists"
- Wang Mengdi's Team Review of "Self-Evolving Agents": From Static LLMs to Artificial Superintelligence (ASI)
- Anthropic Team Uncovers 'Persona Variables' to Control Large Language Model Behavior, Cracking the Black Box of AI Madness
- Is Your Model's Attention Drifting? RUC and Tsinghua University Introduce LeaF: Pruning Distracting Tokens for Focused Learning
- Can Models Truly "Reflect on Code"? Beihang University Releases Repository-Level Understanding and Generation Benchmark, Refreshing the LLM Understanding Evaluation Paradigm
- ReaGAN: Empowering Each Node as an Intelligent Reasoning Expert in Graphs
- Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
- RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
- RAG Can Also Reason! Thoroughly Solving the Multi-Source Heterogeneous Knowledge Challenge
- Beyond Human Annotation: Meta Introduces CoT-Self-Instruct – Reshaping LLM Training with 'Reasoning-Driven Self-Evolution'
- A Deep Dive: Where Does Large Model Training Time Go?
- Revisiting Qwen3's Abandoned Mixed Inference Mode