Classic platformer Super Mario Bros. puts AI to the test

Comparing AI models is notoriously difficult, and their creators are often accused of bias, partiality, and making test results difficult for ordinary people to understand. So rather than focusing on abstract math and logic tests, the researchers proposed testing the AI ​​using Nintendo’s classic platformer Super Mario Bros.

Image source: Hao AI Lab

The experiment used an emulated version of Super Mario Bros. that was integrated with a custom framework called GamingAgent from researchers at the Hao AI Lab at the University of California, San Diego. This system allowed AI models to control Mario by generating Python code. All models were given the same basic instructions, like “Jump over this enemy,” as well as visualizations of the game state in the form of screenshots.

While Super Mario Bros. may look like a simple 2D platformer, researchers have found that the classic Nintendo game seriously challenges AI to plan complex movement sequences and adapt gameplay strategies on the fly.

The best model in mastering Super Mario Bros. was recognized by the researchers as Claude 3.7 from Anthropic, which demonstrated impressive reflexes, stringing together precise jumps and skillfully avoiding enemies. Its predecessor, Claude 3.5, also showed decent results, while OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro lagged behind the competition.

As it turns out, logical thinking isn’t the key to success in Super Mario Bros. — timing is. Even a small delay can send Mario back to a previous checkpoint. The researchers suggest that the more “conscious” and reasoning models may have taken too long to figure out their next steps, leading to frequent failures.

Of course, using retro games to evaluate AI is largely an experiment. An AI’s ability to beat Super Mario Bros. doesn’t determine how useful it really is, though watching models trained on billions of parameters battle (and often lose) against a seemingly childish game is certainly entertaining.

For those who want to conduct their own experiment, Hao AI Lab has opened the source code of its GamingAgent on GitHub.

admin

Share
Published by
admin

Recent Posts

Amazon Set to Take on OpenAI, Google, and Anthropic with Its Own Reasoning AI Model

Amazon is developing a new artificial intelligence (AI) model with advanced reasoning capabilities. The model…

1 hour ago

‘A New Standard for the Genre and More’: Co-op Adventure Split Fiction from It Takes Two Creators Has Critics Wowed

Two days before the official release, the first reviews from the Western press were received…

2 hours ago

Trump’s actions have resulted in billions in capitalization losses for tech companies

U.S. tech stocks have fallen more than 7% since President Donald Trump was inaugurated. New…

2 hours ago