Image source: scale.com
Benchmark, created by experts from around the world, contains extremely complex questions and tasks on knowledge and reasoning – even some people cannot understand individual questions in it, not to mention the answer to them. Soon after her exit, the list of leaders in the exam was headed by the reasoning model of the Deepseek R1 AI, which gave 9.4 % of the correct answers. Openai O3-Mini models with a result of 10.5 % and O3-Mini-High could overtake it, which scored 13 %-the latter is really more powerful, but it also works slower. But the result was shown by the Aegent Openai Deep Research more impressive-it scored 26.6 %, thereby driving the previous less than 10 days.
It has recently become known that the OpenAI startup will still retain a structure that…
It has recently become known that the OpenAI startup will still retain a structure that…
Waymo's plans to expand its robo-taxis fleet require ramping up its manufacturing capacity, and a…
After the latest GPT-4o update was rolled back due to the model being too accommodating,…
Apple is facing a new class action lawsuit over allegations that it violated an injunction…
After months of controversy and legal challenges, including a lawsuit from Elon Musk, OpenAI has…