In December, OpenAI unveiled its o3 reasoning AI model and showed off its ARC-AGI benchmark results, the most challenging test for assessing AI capabilities. Now, the results have been revised, and they look less impressive: the model turned out to be too expensive to maintain.
Image source: Mariia Shalabaieva / unsplash.com
Last week, the Arc Prize Foundation, the organization behind ARC-AGI, updated its estimate of the compute costs for OpenAI o3. Its most powerful configuration, o3 high, was originally thought to cost around $3,000 to run a single ARC-AGI task. Now, it’s found that the cost is much higher — perhaps as much as $30,000 per task. This illustrates how expensive today’s most sophisticated AI models can be for certain tasks, at least initially. OpenAI hasn’t yet priced the o3, and the model hasn’t been made publicly available, but the Arc Prize Foundation suggested that OpenAI’s o1-pro could be a good benchmark.
«”We believe the o1-pro is a closer comparison [for determining] the true value of the o3 due to the amount of compute used during testing. But it’s not an exact estimate, and we’ve left the o3 as a pre-release designation on our leaderboard to reflect the uncertainty until official pricing is announced,” the Arc Prize Foundation told TechCrunch. The o3 high is known to have used 172 times more compute in ARC-AGI than the o3 low, the weakest model in the lineup, when solving one problem.
Earlier, it was reported that OpenAI’s advanced systems can be extremely expensive to plan for — up to $20,000 per month for specialized AI agents. At the same time, models are prone to errors: the same o3 high needed 1024 attempts to solve each ARC-AGI test problem to show the best result.