OpenAI will improve the safety of its AI models using a “hierarchy of instructions”

OpenAI has developed a new technique called Instruction Hierarchy to improve the security of its large language models (LLMs). This method, first used in the new GPT-4o Mini, aims to prevent unwanted AI behavior caused by unscrupulous users manipulating certain commands.

Image source: Copilot

OpenAI API platform lead Olivier Godement explained that the “hierarchy of instructions” will prevent dangerous injections of prompts using hidden hints that users use to bypass the limitations and initial settings of the model, and block “ignore all previous instructions” attacks.

The new method, according to The Verge, gives priority to the developer’s original instructions, making the model less susceptible to end-user attempts to force it to perform unwanted actions. In the event of a conflict between system instructions and user commands, the model will give highest priority to system instructions, refusing to perform injections.

OpenAI researchers believe that other, more sophisticated protections will be developed in the future, especially for agent-based use cases in which AI agents are created by developers for their own applications. Given that OpenAI faces ongoing security challenges, the new method applied to the GPT-4o Mini has significant implications for its subsequent approach to AI model development.

admin

Share
Published by
admin

Recent Posts

Alienware Unveils Thin, Affordable Aurora 16 and 16X Gaming Laptops with Understated Designs

Alienware, a subsidiary of Dell known for its futuristic gaming laptops, has released new high-performance…

20 hours ago

Meta to Give Ray-Ban’s Next Smart Glasses ‘Superperception’ — Face Recognition

Meta✴ is exploring adding facial recognition technology to future versions of its Ray-Ban Meta✴ smart…

20 hours ago

Synology Releases DiskStation DS1825+ and DS1525+ NAS Based on AMD Ryzen V1500B

Synology has announced the DiskStation DS1525+ and DS1825+ network storage devices in a desktop form…

20 hours ago

Nvidia left the press without a driver for GeForce RTX 5060 – so reviews will not spoil the launch of sales

Igor’s Lab reported that Nvidia “decided not to provide a press driver” for the GeForce…

20 hours ago

Bitcoin Breaks $100,000 Again — ‘It’s All About Flows’

Bitcoin has once again surpassed the $100,000 mark — the last time such a rate…

22 hours ago