Category: Model Evaluation

Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
The More Reasoning, The More Hallucinations? The "Hallucination Paradox" of Multimodal Reasoning Models
Apple's 'Illusion of Thinking' Paper Criticized Again, Claude and Human Co-authored Paper Points Out Its Three Key Flaws
Google | Tracing RAG System Errors: Proposing a Selective Generation Framework to Boost RAG Accuracy by 10%