AINews
  • Latest Articles
  • All Articles
  • English

    Category: AI Benchmarking

    • Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow
    • o3-pro Completes 'Sokoban,' Classic Retro Games Become New Benchmarks for Large Models
    • Amazon's New SOP Benchmark: The Ultimate Test for AI Agents. How Do Top Agents Score?
    • The Smarter AI Gets, The Less Obedient It Becomes! New Study: Strongest Reasoning Models Only Follow Instructions 50% of the Time
    • Are Professional Doctors Far Inferior to AI Models? OpenAI Launches Open-Source Medical Benchmark HealthBench, o3 Shows Strongest Performance
    • ←
    • 1
    • →
    2025 AINews. All rights reserved.