Category: Multimodal AI

Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
Xiaohongshu Open-Sources First Multimodal Large Model, dots.vlm1, Performance Rivals SOTA!
Do Multimodal Large Language Models Truly 'Understand' the World? — Unveiling Core Knowledge Deficits in MLLMs
The More Reasoning, The More Hallucinations? The "Hallucination Paradox" of Multimodal Reasoning Models
Say Less 'Wait', Do More: NoWait Reshapes Large Model Inference Paths
Think While Drawing! Multimodal Reasoning Achieves Significant Improvement!
R1-like Training No Longer Just Focuses on Result Correctness! CUHK Launches SophiaVL-R1 Model
The First Multimodal Dedicated Slow-Thinking Framework! Outperforms GPT-o1 by Nearly 7 Percentage Points, Reinforcement Learning Teaches VLM to "Think Twice"
OPA-DPO: An Efficient Solution for the Hallucination Problem in Multimodal Large Models
Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!
More Capable Than Gemini Diffusion! The First Multimodal Large Diffusion Language Model MMaDA Released, Achieving Strong Reasoning and High Controllability
Understanding RAG, Agent, and Multimodality: Industry Practices and Future Trends
Multimodal Large Models Collectively Fail, GPT-4o Only 50% Safety Pass Rate: SIUO Reveals Cross-Modal Safety Blind Spots
Interview with Step Ahead's Duan Nan: "We Might Be Touching the Upper Limit of Diffusion's Capability"
Matches Claude 3.7 at 1/8th the Cost: "European OpenAI" Mistral AI Releases New Multimodal Model