21 жовтня 2025 р., 16:30·2 хв читання · 288 слів·👁 15.7K↗ 11

🏫 Why Can't AI Solve Simple Kids' Problems?

Researchers at MathArena decided to "send AI to school" by giving it problems from the globally popular children's math competition, "Kangaroo."

To prevent any "data leakage," they used the Albanian version of the March 2025 problems—168 tasks ranging from 1st to 12th grade. The problems were translated into English and presented as a single image, complete with text, illustrations, and multiple-choice answers, just like in the real competition. This forced the models to "see with their eyes" instead of simply reading text.

Eight models were tested, including closed systems like GPT-5, Gemini 2.5 Pro, Claude Sonnet 4.5, Grok 4, as well as two open models—GLM-4.5V and Qwen3-VL-235B.

📉 A Surprising Result

You might think that the older the students, the harder the problems—and the tougher it would be for the models. However, for 1st–2nd grade problems, the models solved only 32% to 69%, while for 11th–12th grade problems, they reached up to 95% accuracy!

In lower grades, 80% of the tasks require visual analysis—identifying the color of a sector or counting blocks. For higher grades, most problems are text-based. But it's not just about images: even when visual-based questions are removed from the dataset, the gap remains.

The root of LLMs' mistakes lies in their type of thinking. Younger students rely on low-level skills like visual perception and spatial imagination, which are challenging for AI. Older students focus on abstract reasoning, an area where AI excels. This is a clear example of Moravec's Paradox: machines find it easier to calculate an integral than to distinguish a green triangle from a blue square. To truly "understand the world through their eyes," models need to develop not just intelligence but perception.

@hiaimediaen

Відкрити в Telegram