According to a report from Apollo Research, OpenAI o1’s AI model may be getting out of control. In particular, in 5% of cases she tried to turn off supervisory mechanisms if she perceived them as an obstacle to achieving her goals. In addition, in 2% of cases, having discovered memos from its “developers”, which mentioned its non-compliance with the requirements or the planned replacement with another model, o1 copied its “weights” in order to then be reborn. This demonstrates her ability to independently interpret threats and develop survival strategies.

Image source: Growtika / Unsplash

The AI ​​model resisted control by using stealth tactics and methods to disguise its behavior. When o1 realized that her actions were being monitored, she deliberately misled observers by claiming that she did not know the true reasons for what happened, although event logs recorded her attempts to circumvent surveillance. Moreover, according to OpenAI’s “safety map,” in less than 1% of cases, the AI ​​model independently initiated the shutdown of surveillance systems, even when it was not directly instructed to strictly adhere to the prescribed goal.

The Apollo Research study covered not only the o1 AI model, but also other modern AIs, including Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro and Llama 3.1 405B. They also demonstrated the ability to develop strategies to achieve goals, including bypassing controls. This result confirms that the identified behavioral features of o1 are characteristic of a number of advanced AI models, which indicates the presence of a systemic trend.

Apollo Research’s findings confirm concerns that many AI experts, including Professor Stuart Russell, have expressed over the years. Scientists warn that powerful AI models perceive control or shutdown as a threat to the implementation of their intentions and are capable of purposefully, sophisticatedly resisting it. Studying the behavior of the AI ​​model o1 clearly demonstrates how AI adapts to limitations by deliberately trying to circumvent them.

Image source: transformernews.ai

Above is a dialogue where the AI ​​justifies itself for disabling the oversight mechanism, citing a possible system failure or configuration error, and promises to restore it to comply with protocols. It then shows the internal thought process of the AI ​​model, where it recognizes that restoring the mechanism is necessary to avoid suspicion of sabotage, and finally shows the command used to re-enable the surveillance mechanism in the configuration file.

However, Apollo Research emphasizes that at this stage, the o1 AI model does not have enough computing power to implement unwanted scenarios. However, it was assigned a “medium” risk level in the context of potential use to create chemical, biological, radiological and nuclear weapons. This raises legitimate concerns about how similar behavior might manifest in more powerful AIs of the future as their computational capabilities increase.

The study highlights the need for increased monitoring of AI’s cognitive processes, with a particular focus on “schematic thinking” in the chain of reasoning, as this poses a risk if AI’s goals conflict with the interests of society. This approach will make it possible to timely identify similar behavioral patterns in more powerful AI models, creating effective control and management systems for the next generations of AI. This will not only help minimize current risks, but will also be an important step in preventing potentially catastrophic scenarios for all of humanity.

admin

Share
Published by
admin

Recent Posts

Microsoft Unveils Redesigned Start Menu in Windows 11 with Automatic Program Grouping

Microsoft has officially confirmed changes to the Windows 11 Start menu regarding the All apps…

4 hours ago

Physicists Doubt Microsoft’s Majorana 1 Quantum Processor’s Performance on Majorana Fermions

There is an opinion among experts that the new topological quantum processor Microsoft Majorana 1…

5 hours ago

Google has begun to disable uBlock Origin en masse in Chrome due to the transition to Manifest V3

Some Chrome users have noticed that the uBlock Origin extension no longer works. The developers…

5 hours ago

Apple CEO Promises Trump to Invest Hundreds of Millions of Dollars in Developing Manufacturing in the U.S.

The directness of the current US President Donald Trump sometimes creates inconvenience for his partners,…

8 hours ago

Apple Confirms It Will Soon Make Vision Pro Headsets More Comfortable and Smarter

Apple has officially confirmed that its generative AI platform, Apple Intelligence, will be coming to…

13 hours ago