OpenAI is behind schedule in developing the latest version of its flagship artificial intelligence model – it will be called GPT-5, but for now it is codenamed Orion. The company has been doing this for 18 months, trying to achieve the desired result, but is failing – there is not enough data around the world to make the model smart enough, writes the Wall Street Journal.

Image source: Mariia Shalabaieva / unsplash.com

OpenAI has conducted at least two large training runs, each of which involves several months of data processing in order to complete Orion. Each time, new problems arose and the system did not produce the results the researchers had hoped for. In its current form, Orion performs better than existing OpenAI systems, but according to the developers, it has not advanced enough to justify the huge costs of maintaining the new model in working order. A six-month training session could cost approximately $500 million in computational costs alone.

Two years ago, OpenAI and its CEO Sam Altman made a splash with the release of ChatGPT. Then it seemed that AI would penetrate all aspects of modern human life and significantly improve it. Analysts have predicted that tech giants’ spending on AI will reach $1 trillion in the coming years. The biggest responsibility lies with OpenAI, which gave birth to the AI ​​boom.

The company’s October funding round was valued at $157 billion, not least because Altman promised a “significant leap forward” across all areas and challenges with GPT-5. The model is expected to make scientific discoveries and easily perform everyday human tasks such as making appointments and booking plane tickets. The researchers also hope that she will learn to doubt her own rightness and will become less likely to “hallucinate”—she will stop confidently giving answers that are not true.

If we assume that GPT-4 operates at the level of a smart high school student, then GPT-5 is expected to perform at the level of a doctor of sciences in certain tasks. There are no clear criteria to determine whether a new generation model is worthy of being called GPT-5: systems are tested on mathematics and programming tasks, but researchers make the final verdict on an intuitive level, and this has not happened yet. They say that the development of large language models is not only a science, but also an art.

Image source: Growtika / unsplash.com

Models are tested during training runs—long periods during which they are sent trillions of tokens, that is, fragments of words. A large training launch could require months of data centers and tens of thousands of Nvidia AI accelerators. Training GPT-4, according to Altman, cost $100 million; it is expected that training future models will cost more than $1 billion. A failed training launch is somewhat similar to a failed rocket test. Researchers are trying to reduce the likelihood of such failures by conducting experiments on a smaller scale—test runs before full-scale ones.

In mid-2023, OpenAI conducted a trial training run, which became a test for the probable Orion architecture – the experiment did not bring much results: it became clear that a full-scale training run would take too long and be very expensive. The results of the Arrakis project showed that the creation of GPT-5 will not go as smoothly as the researchers had hoped. They began making some technical changes to strengthen Orion and concluded that a large amount of varied, high-quality data would be needed, and information from the public Internet might not be enough.

AI models tend to get smarter as they consume large amounts of data—usually books, academic publications, and other credible sources—that help the AI ​​express itself more clearly and cope with a wide range of tasks. When training previous models, OpenAI did not neglect other sources, such as news articles and even social media posts. But to make Orion smarter, additional data sources are needed, and they are not enough. Then the company decided to create this data themselves: they hired people to write code and solve mathematical problems that provided step-by-step explanations of their actions. OpenAI brought in theoretical physicists to come up with explanations of how they would approach solving the toughest problems in their field.

The process is extremely slow. GPT-4 was trained on 13 trillion tokens – for comparison, a thousand people writing five thousand characters a day would generate a billion tokens in a few months. So OpenAI began developing synthetic data—getting other AI systems to generate data to train new AI. But research has shown that feedback loops between AI data generation to AI risks failures or nonsensical responses. To eliminate this problem, data generation was entrusted to another model – o1.

Image source: Mariia Shalabaieva / unsplash.com

By early 2024, OpenAI management began to realize that deadlines were running out. GPT-4 is a year old, competitors have begun to catch up, and the new Anthropic model, according to some estimates, has surpassed it. The Orion project stalled, and OpenAI had to switch to other projects and applications: a lightweight version of GPT-4 and a video generator Sora were released. As a result, internal competition arose – developers of Orion and other products competed for limited computing resources.

Competition among AI developers has become so fierce that large technology companies are publishing fewer articles about the latest discoveries or breakthroughs than is customary in the scientific community. Money flooded into the market, and corporations began to view research results as trade secrets that should be protected. It got to the point where researchers stopped working on airplanes, coffee shops, and other public places where someone might be looking over their shoulder.

In early 2024, OpenAI prepared for another Orion launch attempt, armed with a better data set. During the first few months of the year, the researchers conducted several small training runs to know where to work next. By May, they decided they were ready to conduct a large-scale Orion launch, which was scheduled to last until November. But early on, a problem with the data emerged: it was less diversified than expected, limiting the potential quality of AI training. The problem didn’t show up in the trial projects and only became apparent after the big launch began – but by then OpenAI had spent too much time and money to start over. Researchers have tried to find a wider range of data to feed the model during training, but it is still unclear whether this strategy has proven fruitful.

The difficulties with Orion pointed OpenAI to a new approach to making large language models smarter—reasoning. Reasoning ability helps AI solve complex problems for which it has not been trained. This is how the OpenAI o1 model works – it generates several answers to each question and analyzes them in search of the best one. But there is no certainty about this yet: according to Apple researchers, “reasoning” models probably only interpret the data received during training, but do not actually solve new problems. For example, if you make minor changes to the conditions of the original problem that are not relevant to its solution, the quality of the AI ​​response drops sharply.

This extra intelligence comes at a cost: OpenAI has to pay to generate multiple responses instead of just one. “It turns out that if a bot thinks for just 20 seconds in a poker game, the cost increases the same as if the model grew 100,000 times larger and trained 100,000 times longer,” said OpenAI research scientist Noam Brown. Orion could be based on a more advanced and efficient model capable of reasoning. The company’s researchers are pursuing this approach and hope to combine it with large amounts of data, some of which may come from other AI models created by OpenAI. Then the results of her work will be refined on the material created by people.

Leave a Reply

Your email address will not be published. Required fields are marked *