Category: Large Language Models

Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
Why Do Large Language Models Hallucinate? OpenAI's Latest Research Uncovers the Reasons
Stanford's Latest Research: Even the Strongest LLMs Struggle with Cutting-Edge Code! Gemini 2.5 Pro's Success Rate Under 40%
Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
【Master's Thoughts】Martin Fowler's AI Musings: We're in an Era Where Even the "Problem" Isn't Clear
Meta Introduces Deep Think with Confidence: Boosting Reasoning Accuracy and Efficiency with Minimal Changes
MCP Tool Stacking is a Trap! Developer Guru: Command Line's 'Brittleness' Crushes AI! Better to Axe It Down to a Single Code Executor: 7 Calls Become 1! Netizens: Should've Abandoned Black Box Tools Long Ago!
LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
A New Revolution in Reward Models! SWIFT Reads "Inner Voice" Instead of Text, Creating a Faster, Stronger, and More Cost-Effective AI Judge
The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
GPT-5 vs Claude Opus 4.1: Coding Capability Assessment
In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures
ARPO: Agentic Reinforced Policy Optimization, Enabling Agents to Explore One Step Further at Critical Moments
Open-Sourcing the Largest High-Quality Scientific Reasoning Post-Training Dataset to Quickly Turn Qwen3 and Others into "Scientists"
Wang Mengdi's Team Review of "Self-Evolving Agents": From Static LLMs to Artificial Superintelligence (ASI)
Anthropic Team Uncovers 'Persona Variables' to Control Large Language Model Behavior, Cracking the Black Box of AI Madness
Is Your Model's Attention Drifting? RUC and Tsinghua University Introduce LeaF: Pruning Distracting Tokens for Focused Learning
Can Models Truly "Reflect on Code"? Beihang University Releases Repository-Level Understanding and Generation Benchmark, Refreshing the LLM Understanding Evaluation Paradigm
ReaGAN: Empowering Each Node as an Intelligent Reasoning Expert in Graphs
Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent
RAG Can Also Reason! Thoroughly Solving the Multi-Source Heterogeneous Knowledge Challenge
Beyond Human Annotation: Meta Introduces CoT-Self-Instruct – Reshaping LLM Training with 'Reasoning-Driven Self-Evolution'
A Deep Dive: Where Does Large Model Training Time Go?
Revisiting Qwen3's Abandoned Mixed Inference Mode