OpenAI has cut back on the time and resources it spends on security testing of its powerful AI models, raising concerns that the company’s technologies are being released too quickly and without sufficient safeguards against threats.

Image source: Levart_Photographer / unsplash.com

OpenAI and its partners are now given just a few days to conduct risk and performance assessments of AI models, up from months. OpenAI’s review mechanisms have become less thorough, with fewer resources devoted to identifying and mitigating threats, as the $300 billion valuation forces the company to release new models quickly to maintain a competitive advantage, the Financial Times reported, citing eight people familiar with the matter. As the capabilities of large language models grow, so does the potential for them to be weaponized; but so does demand, and the company’s management is keen to ship products as quickly as possible.

There is no global standard for the process of testing AI for safety, but provisions in the European Union’s AI Act will come into force this year that will require developers to test the safety of their most powerful models. Some developers have previously voluntarily committed to allowing third-party researchers to do such testing in the UK and US. OpenAI aims to release its new o3 model next week, giving it less than a week to test it; however, the release date could change.

Image source: Levart_Photographer / unsplash.com

The company has never devoted so little time to this issue. In 2023, GPT-4 was released, for which evaluations were conducted for about six months before. Some dangerous capabilities of the model, one of the participants in its testing said, were discovered only two months after the start of the process. OpenAI has committed to creating special versions of AI systems to evaluate the possibility of their non-targeted use – for example, to find out whether it is possible to make a biological virus more infectious. This task requires significant resources: collecting specialized information, for example, on virology, and feeding it into the model during additional training – fine-tuning.

In reality, the company only fulfills its obligations to a limited extent, fine-tuning older and less effective models and ignoring more powerful and advanced ones. For example, the safety report for the o3-mini model released in January provides information about the earlier GPT-4o; the company did not report on some tests for o1 and o3-mini at all. OpenAI defended itself by saying that it had improved the efficiency of its assessment processes and introduced automated tests, which helped reduce the time frame. There is no agreed-upon recipe for approaches such as fine-tuning, the company recalled, but expressed confidence that its methods are the best possible, and they were noted in the reports with maximum transparency.

Another problem is that security tests are often conducted not on the final models that are released to the public, but on “checkpoints” — earlier versions that are later updated to gain improved performance and new features, and OpenAI reports refer to “near-final” versions. The company noted that the “checkpoints” are “largely identical” to the versions that are released to the public.

Leave a Reply

Your email address will not be published. Required fields are marked *