Category: Multimodal AI
- Abandoning Manual Annotation! Chinese Team Proposes Self-Evolution Algorithm for Multimodal Large Models
- Xiaohongshu Open-Sources First Multimodal Large Model, dots.vlm1, Performance Rivals SOTA!
- Do Multimodal Large Language Models Truly 'Understand' the World? — Unveiling Core Knowledge Deficits in MLLMs
- The More Reasoning, The More Hallucinations? The "Hallucination Paradox" of Multimodal Reasoning Models
- Say Less 'Wait', Do More: NoWait Reshapes Large Model Inference Paths
- Think While Drawing! Multimodal Reasoning Achieves Significant Improvement!
- R1-like Training No Longer Just Focuses on Result Correctness! CUHK Launches SophiaVL-R1 Model
- The First Multimodal Dedicated Slow-Thinking Framework! Outperforms GPT-o1 by Nearly 7 Percentage Points, Reinforcement Learning Teaches VLM to "Think Twice"
- OPA-DPO: An Efficient Solution for the Hallucination Problem in Multimodal Large Models
- Thinking with Images Only: Reinforcement Learning Forges a New Reasoning Model Paradigm, Maximizing Complex Scene Planning!
- More Capable Than Gemini Diffusion! The First Multimodal Large Diffusion Language Model MMaDA Released, Achieving Strong Reasoning and High Controllability
- Understanding RAG, Agent, and Multimodality: Industry Practices and Future Trends
- Multimodal Large Models Collectively Fail, GPT-4o Only 50% Safety Pass Rate: SIUO Reveals Cross-Modal Safety Blind Spots
- Interview with Step Ahead's Duan Nan: "We Might Be Touching the Upper Limit of Diffusion's Capability"
- Matches Claude 3.7 at 1/8th the Cost: "European OpenAI" Mistral AI Releases New Multimodal Model