♟ o3 Trounces Grok 4 in Chess Tournament Final
OpenAI's o3 model swept Grok 4 four games to none in the final of the first chess tournament between large language models, held August 5-7 on the Kaggle Game Arena platform. The event was organized by Google DeepMind and Kaggle. The use of chess engines and the Internet was prohibited.
The final was commentated live by 16th World Champion Magnus Carlsen.
"o3 is fairly ruthless in conversions, it looks like a chess player. Grok looks like it learnt a few opening moves and knows the rules but not much more," he said.
According to Carlsen, the LLM showdown resembled children's games. The models frequently hung pieces and made impulsive moves. He estimated Grok's strength at roughly 800 Elo and o3 at about 1,200, which corresponds to beginner-level players. Carlsen himself is rated over 2,800, while the strongest chess engines are above 3,500.
Third place went to Google's Gemini 2.5 Pro, which beat OpenAI's o4-mini in the bronze match. Gemini 2.5 Flash, Claude Opus 4, DeepSeek R1, and Kimi k2 also competed.
♟ Why It Matters?
The tournament's goal is to test models' strategic reasoning. Google aims to establish Kaggle Game Arena as a definitive benchmark with strict ratings.
Right now, even the chess engine on the 1970s Atari 2600 can outplay today's leading LLMs. Chatbots aren't built for tactics or calculating thousands of lines; to them, a chess game is just text they try to navigate.
🔜 Watch the final here.


