Google's Challenge: DeepSeek, Kimi and More to Compete in First Large Model Showdown Starting Tomorrow

An exciting AI chess competition is about to begin.

We've seen enough researchers setting new benchmarks in papers every day; it's time to bring the models out for a real test. Are their performances truly as overwhelming as rumored?

From August 5th to 7th, Pacific Time, a 3-day AI chess competition is highly anticipated.

On the first day, 8 cutting-edge AI models will face off:

Participating models include:

o4-mini (OpenAI)

DeepSeek-R1 (DeepSeek)

Kimi K2 Instruct (Moonshot AI)

o3 (OpenAI)

Gemini 2.5 Pro (Google)

Claude Opus 4 (Anthropic)

Grok 4 (xAI)

Gemini 2.5 Flash (Google)

Image

Livestream address: https://www.youtube.com/watch?v=En_NJJsbuus

This time, the participants are top-tier AI models (including two open-source models from China), and the performance of the competing sides is well-matched.

The organizers have also invited world-class chess experts as commentators, showing great sincerity.

This competition is primarily based on Kaggle Game Arena, a new, public benchmark platform launched by Google, where AI models can directly compete in strategy games (such as chess and other games) to determine superiority.

To ensure transparency, both the game execution framework and the game environment itself will be open-sourced. The final rankings will be determined using a strict all-play-all tournament format, with numerous matches between each pair of models to ensure the statistical reliability of the results.

Nobel laureate and Google DeepMind co-founder and CEO Demis Hassabis enthusiastically stated: "Games have always been an important proving ground for AI capabilities (including our research on AlphaGo and AlphaZero), and we are now incredibly excited about the progress this benchmark platform can drive. As we continue to introduce more games and challenges to Arena, we expect AI capabilities to rapidly improve!"

"Kaggle Game Arena, this new leaderboard platform, is where AI systems compete against each other, and as model capabilities improve, the difficulty of the matches will also continuously escalate."

ImageImage

As for why this competition is organized, the Google blog introduces it as follows: current AI benchmarks are struggling to keep up with the development speed of modern models. While these tests are still useful for measuring model performance on specific tasks, for models trained on the internet, it's hard to tell if they are truly solving problems or merely repeating answers they've seen. As models approach 100% scores on some benchmarks, their effectiveness in differentiating model performance gradually weakens.

Therefore, while continuously developing existing benchmarks, researchers are also constantly exploring new methods for model evaluation. Game Arena was born out of this context.

Competition Introduction

Each game on the Kaggle Game Arena platform has a detail page where users can view:

Real-time updated tournament brackets;

Dynamic leaderboard data;

Open-source environment code and technical documentation for the corresponding game's test framework.

Users can also view the real-time tournament brackets:

Image

Brackets: https://www.kaggle.com/benchmarks/kaggle/chess-text/tournament

Model performance in games will be displayed on the Kaggle Benchmarks leaderboard.

Tournament Rules

This competition employs a single-elimination format, with each match consisting of four games. The first model to score two points advances (1 point for a win, 0.5 points for a draw). If a match ends in a 2-2 tie, an additional decisive game will be played, in which the player playing white must win to advance.

Specific Schedule

August 5th (Day 1): 8 models compete in 4 matches (4 games each)

August 6th (Day 2): The 4 advancing models play 2 semi-final matches

August 7th (Finals Day): The ultimate championship battle

Competition Rules

As current large models are more proficient in text-based expression, this competition starts with a text-input based approach.

Here is a brief description of the execution framework:

Models cannot use any external tools. For example, they cannot call chess engines like Stockfish to obtain optimal moves.

Models will not be told the list of legal moves for the current board state.

If a model makes an illegal move, the organizers will give it up to 3 retry opportunities. If a legal move is still not submitted within a total of 4 attempts, the game will terminate, resulting in a loss for that model and a win for the opponent.

Each move has a 60-minute time limit.

During the competition, viewers will be able to see how each model reasons its moves and its self-correction process when facing illegal moves.

Image

Everyone is already eagerly awaiting the results of the competition.

Image

For more competition details, please refer to: https://www.kaggle.com/game-arena

There are 14 hours until the first match begins. You can start looking forward to it. Which model do you think will be the ultimate winner?

Image

Main Tag:AI Benchmarking

Sub Tags:Large Language ModelsModel EvaluationKaggle Game ArenaAI Chess


Previous:RAG Revolution! Graph-R1, the First RL-driven Graph Reasoning Agent

Next:ReaGAN: Empowering Each Node as an Intelligent Reasoning Expert in Graphs

Share Short URL