OpenAI has developed a new technique called Instruction Hierarchy to improve the security of its large language models (LLMs). This method, first used in the new GPT-4o Mini, aims to prevent unwanted AI behavior caused by unscrupulous users manipulating certain commands.
OpenAI API platform lead Olivier Godement explained that the “hierarchy of instructions” will prevent dangerous injections of prompts using hidden hints that users use to bypass the limitations and initial settings of the model, and block “ignore all previous instructions” attacks.
The new method, according to The Verge, gives priority to the developer’s original instructions, making the model less susceptible to end-user attempts to force it to perform unwanted actions. In the event of a conflict between system instructions and user commands, the model will give highest priority to system instructions, refusing to perform injections.
OpenAI researchers believe that other, more sophisticated protections will be developed in the future, especially for agent-based use cases in which AI agents are created by developers for their own applications. Given that OpenAI faces ongoing security challenges, the new method applied to the GPT-4o Mini has significant implications for its subsequent approach to AI model development.
Alibaba Cloud presented at its annual Apsara conference a modular data center architecture called “CUBE…
The original Resident Evil 3: Nemesis turned 25 years old yesterday, and the digital distribution…
The United States and India have reached an agreement under which a new semiconductor manufacturing…
For more than 25 years since the release of the original Half-Life, players have tried…
Image Source: Mediatonic Among the available formats are team deathmatch, every man for himself, and…
Seasonic has released the PRIME PX-2200 2200 W power supply. The new product was first…