Category: AI Research
- We Planted a Word in Claude's Mind, and It Began to "Rationalize"! Anthropic's Latest Research: AI Possesses Introspective Abilities!
- Google Reveals: Scaling Through Multi-Agent Reasoning Is the Future.
- Just Released! Tsinghua and Partners Open Source UltraRAG 2.0! Performance Soars by 12%
- SJTU & Stanford Propose "Long Code Compression Artifact": 5.6x Extreme Slimming Without Performance Drop
- Princeton Danqi Chen's Group's New Work: RLHF Insufficient, RLVR Bounded? RLMT Forges a Third Path
- The More You Think, The More You Err: CoT "Deep Deliberation" as a Catalyst for LLM Hallucinations!
- LLMs Dominate Math Boards, Yet Forget How to Chat? CMU et al. Reveal Striking Differences Between SFT and RL!
- The "Mirage" of Chain-of-Thought Reasoning: An In-depth Look at LLM Generalization
- Xiaohongshu Open-Sources First Multimodal Large Model, dots.vlm1, Performance Rivals SOTA!
- Counter-Intuitive RL Research: Directly Providing Answers to LLMs is More Effective Than Detailed Step-by-Step Instructions!
- Sacrificing Sleep for a Blog Post Lands OpenAI Offer! Muon Author Angrily Reveals: "Almost All Optimizer Papers Are Fake"
- Apple's 'Illusion of Thinking' Paper Criticized Again, Claude and Human Co-authored Paper Points Out Its Three Key Flaws
- Apple's Major AI Paper Flops! Criticized for Flawed Testing Methods... Netizens: Cook Should Fire Them!
- AI Surpasses Humans in Mathematics in Seven Months, Breaking Through Mathematicians' "Siege"! 14 Mathematicians Delve into Raw Reasoning Tokens: Not by Rote Learning, but by Intuition
- The Sky Has Fallen! Apple Just Proved: DeepSeek, o3, Claude and Other "Reasoning" Models Lack True Reasoning Ability
- World's Top Mathematicians Amazed by AI's Proficiency in Their Work
- DeepMind's Latest Research: Agents Are World Models!
- Closer to AGI? Running Google's AlphaEvolve and UBC's DGM for Just 0.31 Yuan?
- The Smarter the Model, the Less Obedient? MathIF Benchmark Reveals AI Obedience Vulnerabilities
- Process Supervision > Outcome Supervision! Huawei City University Reconstructs RAG Inference Training, 5k Samples Outperform 90k Model
- LLM + RL Questioned: Deliberately Using Incorrect Rewards Still Significantly Boosts Math Benchmarks, Causing a Stir in the AI Community
- Qwen Team Releases Long-Context Reasoning Model QwenLong-L1, Surpassing o3-mini
- How She Brought "System 2" to Large Language Models | An Interview with Dr. Li Zhang from Microsoft Research Asia
- Statistically Controllable Data Synthesis! New Framework Breaks LLM Data Generation Limitations, McGill University Team Launches LLMSynthor
- How Strong is the Reasoning Ability of Large Language Models? A Study Reveals LLMs' Limitations and Potential