AINews
  • Latest Articles
  • All Articles
  • English

    Category: Benchmarking

    • Can Models Truly "Reflect on Code"? Beihang University Releases Repository-Level Understanding and Generation Benchmark, Refreshing the LLM Understanding Evaluation Paradigm
    • Multimodal Large Models Collectively Fail, GPT-4o Only 50% Safety Pass Rate: SIUO Reveals Cross-Modal Safety Blind Spots
    • The 'Olympics' of AI? OpenAI Releases New Benchmark MRCR, Pushing Models' 'Needle in a Haystack' Ability to the Limit!
    • ←
    • 1
    • →
    2025 AINews. All rights reserved.