Category: AI Research

We Planted a Word in Claude's Mind, and It Began to "Rationalize"! Anthropic's Latest Research: AI Possesses Introspective Abilities!
Google Reveals: Scaling Through Multi-Agent Reasoning Is the Future.
Just Released! Tsinghua and Partners Open Source UltraRAG 2.0! Performance Soars by 12%
SJTU & Stanford Propose "Long Code Compression Artifact": 5.6x Extreme Slimming Without Performance Drop
Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
The More You Think, The More You Err: CoT "Deep Deliberation" as a Catalyst for LLM Hallucinations!
LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
Xiaohongshu Open-Sources First Multimodal Large Model, dots.vlm1, Performance Rivals SOTA!
Counter-Intuitive RL Research: Directly Providing Answers to LLMs is More Effective Than Detailed Step-by-Step Instructions!
Sacrificing Sleep for a Blog Post Lands OpenAI Offer! Muon Author Angrily Reveals: "Almost All Optimizer Papers Are Fake"
Apple's 'Illusion of Thinking' Paper Criticized Again, Claude and Human Co-authored Paper Points Out Its Three Key Flaws
Apple's Major AI Paper Flops! Criticized for Flawed Testing Methods... Netizens: Cook Should Fire Them!
AI Surpasses Humans in Mathematics in Seven Months, Breaking Through Mathematicians' "Siege"! 14 Mathematicians Delve into Raw Reasoning Tokens: Not by Rote Learning, but by Intuition
The Sky Has Fallen! Apple Just Proved: DeepSeek, o3, Claude and Other "Reasoning" Models Lack True Reasoning Ability
World's Top Mathematicians Amazed by AI's Proficiency in Their Work
DeepMind's Latest Research: Agents Are World Models!
Closer to AGI? Running Google's AlphaEvolve and UBC's DGM for Just 0.31 Yuan?
The Smarter the Model, the Less Obedient? MathIF Benchmark Reveals AI Obedience Vulnerabilities
Process Supervision > Outcome Supervision! Huawei City University Reconstructs RAG Inference Training, 5k Samples Outperform 90k Model
LLM + RL Questioned: Deliberately Using Incorrect Rewards Still Significantly Boosts Math Benchmarks, Causing a Stir in the AI Community
Qwen Team Releases Long-Context Reasoning Model QwenLong-L1, Surpassing o3-mini
How She Brought "System 2" to Large Language Models | An Interview with Dr. Li Zhang from Microsoft Research Asia
Statistically Controllable Data Synthesis! New Framework Breaks LLM Data Generation Limitations, McGill University Team Launches LLMSynthor
How Strong is the Reasoning Ability of Large Language Models? A Study Reveals LLMs' Limitations and Potential