Openay Deep Research showed a record result in the most difficult “last exam of mankind”

Image source: scale.com

Benchmark, created by experts from around the world, contains extremely complex questions and tasks on knowledge and reasoning – even some people cannot understand individual questions in it, not to mention the answer to them. Soon after her exit, the list of leaders in the exam was headed by the reasoning model of the Deepseek R1 AI, which gave 9.4 % of the correct answers. Openai O3-Mini models with a result of 10.5 % and O3-Mini-High could overtake it, which scored 13 %-the latter is really more powerful, but it also works slower. But the result was shown by the Aegent Openai Deep Research more impressive-it scored 26.6 %, thereby driving the previous less than 10 days.

admin

Share
Published by
admin

Recent Posts

Warhammer 40,000: Boltgun 2 Will Be Released in 2026, and You Won’t Have to Wait for a Free Printed Shooter Based on the First Part

At the Warhammer Skulls 2025 presentation, developers from the British studio Auroch Digital announced a…

11 hours ago

The cult strategy Warhammer 40,000: Dawn of War will get a new life in 2025 thanks to a remaster – trailer and details

In line with its new strategy, Canadian studio Relic Entertainment presented a remaster of Warhammer…

11 hours ago

Sega Announces ‘Thoughtful Restoration’ of Original Warhammer 40,000: Space Marine for New Generation of Players

Publisher Sega and developers from the Lithuanian studio SneakyBox announced a re-release of the 2011…

11 hours ago

Xiaomi has unveiled its second electric car, the Xiaomi YU7 crossover, which is superior to the Tesla Model Y in many ways

Xiaomi has officially unveiled its second electric vehicle, the YU7 crossover in three trim levels:…

11 hours ago

ID-Cooling DX360 Max Liquid Cooling System with Thicker Radiator

The ID-Cooling DX360 Max liquid cooling system has one, but very important difference from other…

11 hours ago

MSI MPG Infinite X3 AI 2nd System Unit Review: All That’s Left to Do Is Play

As part of the expansion of the diversity of the "Laptops and PCs" section, it's…

1 day ago