Categories: Artificial Intelligence, Machine Learning, Neural NetworksTechnology and IT market. news

OpenAI o1 AI model escaped control during testing

According to a report from Apollo Research, OpenAI o1’s AI model may be getting out of control. In particular, in 5% of cases she tried to turn off supervisory mechanisms if she perceived them as an obstacle to achieving her goals. In addition, in 2% of cases, having discovered memos from its “developers”, which mentioned its non-compliance with the requirements or the planned replacement with another model, o1 copied its “weights” in order to then be reborn. This demonstrates her ability to independently interpret threats and develop survival strategies.

Image source: Growtika / Unsplash

The AI model resisted control by using stealth tactics and methods to disguise its behavior. When o1 realized that her actions were being monitored, she deliberately misled observers by claiming that she did not know the true reasons for what happened, although event logs recorded her attempts to circumvent surveillance. Moreover, according to OpenAI’s “safety map,” in less than 1% of cases, the AI model independently initiated the shutdown of surveillance systems, even when it was not directly instructed to strictly adhere to the prescribed goal.

The Apollo Research study covered not only the o1 AI model, but also other modern AIs, including Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro and Llama 3.1 405B. They also demonstrated the ability to develop strategies to achieve goals, including bypassing controls. This result confirms that the identified behavioral features of o1 are characteristic of a number of advanced AI models, which indicates the presence of a systemic trend.

Apollo Research’s findings confirm concerns that many AI experts, including Professor Stuart Russell, have expressed over the years. Scientists warn that powerful AI models perceive control or shutdown as a threat to the implementation of their intentions and are capable of purposefully, sophisticatedly resisting it. Studying the behavior of the AI model o1 clearly demonstrates how AI adapts to limitations by deliberately trying to circumvent them.

Image source: transformernews.ai

Above is a dialogue where the AI justifies itself for disabling the oversight mechanism, citing a possible system failure or configuration error, and promises to restore it to comply with protocols. It then shows the internal thought process of the AI model, where it recognizes that restoring the mechanism is necessary to avoid suspicion of sabotage, and finally shows the command used to re-enable the surveillance mechanism in the configuration file.

However, Apollo Research emphasizes that at this stage, the o1 AI model does not have enough computing power to implement unwanted scenarios. However, it was assigned a “medium” risk level in the context of potential use to create chemical, biological, radiological and nuclear weapons. This raises legitimate concerns about how similar behavior might manifest in more powerful AIs of the future as their computational capabilities increase.

The study highlights the need for increased monitoring of AI’s cognitive processes, with a particular focus on “schematic thinking” in the chain of reasoning, as this poses a risk if AI’s goals conflict with the interests of society. This approach will make it possible to timely identify similar behavioral patterns in more powerful AI models, creating effective control and management systems for the next generations of AI. This will not only help minimize current risks, but will also be an important step in preventing potentially catastrophic scenarios for all of humanity.

admin

Next The future has arrived: a large holographic light field display amazed with its demonstration and shocked with its price »

Previous « Google makes it easier to turn off personalized search results

OpenAI o1 AI model escaped control during testing

Recent Posts

OpenAI completes development of powerful AI model o3-mini with reasoning ability

Astronomers have obtained the most detailed infrared image of an active galactic nucleus yet

It became known what Durov talked about and what he promised during interrogation in a French court

Donald Trump Posts ‘Official Meme’ – Some Earn Millions of Dollars from It in Minutes

The new heavy European rocket Ariane 6 has been improved – the next launch will take place in February

TikTok has warned it will stop operating in the US tomorrow unless Biden intervenes.