Category: Large Language Models
- GPT models becoming more conservative? Stanford Manning team proposes Verbalized Sampling to make models "think a bit more"
- Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
- Abandoning Fine-Tuning: Stanford Co-releases Agentic Context Engineering (ACE), Boosting Model Performance by 10% and Reducing Token Costs by 83%
- Just Released! Tsinghua and Partners Open Source UltraRAG 2.0! Performance Soars by 12%
- Google Enters the CUA Battleground, Launches Gemini 2.5 Computer Use: Allowing AI to Directly Operate the Browser
- LLMs in Document Intelligence: Survey, Progress, and Future Trends
- Chinese Team Trains "Spiking Large Model," Boosting Inference Speed by 100 Times
- NeurIPS'25! AutoPrune: A Plug-and-Play Adaptive Pruning Framework for Large Models
- SJTU & Stanford Propose "Long Code Compression Artifact": 5.6x Extreme Slimming Without Performance Drop
- Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
- First Code World Model Ignites AI Community, Enabling "True Reasoning" for Agents, Meta Open-Sources It
- The More You Think, The More You Err: CoT "Deep Deliberation" as a Catalyst for LLM Hallucinations!
- Boost LLM Reasoning Accuracy to 99% Without Fine-Tuning! Try DeepConf, a Lightweight Inference Framework | Latest from Meta
- Stanford Proposes New RL Paradigm: 3B Model Agent Outperforms Claude, GPT-4
- Why Do Large Language Models Hallucinate? OpenAI's Latest Research Uncovers the Reasons
- Stanford's Latest Research: Even the Strongest LLMs Struggle with Cutting-Edge Code! Gemini 2.5 Pro's Success Rate Under 40%
- Microsoft Introduces rStar2-Agent: "Thinking Smarter" Proves Far More Effective and Efficient Than Simply "Thinking Longer"
- 【Master's Thoughts】Martin Fowler's AI Musings: We're in an Era Where Even the "Problem" Isn't Clear
- Meta Introduces Deep Think with Confidence: Boosting Reasoning Accuracy and Efficiency with Minimal Changes
- MCP Tool Stacking is a Trap! Developer Guru: Command Line's 'Brittleness' Crushes AI! Better to Axe It Down to a Single Code Executor: 7 Calls Become 1! Netizens: Should've Abandoned Black Box Tools Long Ago!
- LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
- A New Revolution in Reward Models! SWIFT Reads "Inner Voice" Instead of Text, Creating a Faster, Stronger, and More Cost-Effective AI Judge
- The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
- GPT-5 vs Claude Opus 4.1: Coding Capability Assessment
- In-depth Dissection of Large Models: From DeepSeek-V3 to Kimi K2, Understanding Mainstream LLM Architectures