OpenAI will improve the safety of its AI models using a “hierarchy of instructions”

OpenAI has developed a new technique called Instruction Hierarchy to improve the security of its large language models (LLMs). This method, first used in the new GPT-4o Mini, aims to prevent unwanted AI behavior caused by unscrupulous users manipulating certain commands.

Image source: Copilot

OpenAI API platform lead Olivier Godement explained that the “hierarchy of instructions” will prevent dangerous injections of prompts using hidden hints that users use to bypass the limitations and initial settings of the model, and block “ignore all previous instructions” attacks.

The new method, according to The Verge, gives priority to the developer’s original instructions, making the model less susceptible to end-user attempts to force it to perform unwanted actions. In the event of a conflict between system instructions and user commands, the model will give highest priority to system instructions, refusing to perform injections.

OpenAI researchers believe that other, more sophisticated protections will be developed in the future, especially for agent-based use cases in which AI agents are created by developers for their own applications. Given that OpenAI faces ongoing security challenges, the new method applied to the GPT-4o Mini has significant implications for its subsequent approach to AI model development.

admin

Share
Published by
admin

Recent Posts

Study: Apple C1 mobile modem falls short of Qualcomm modems in terms of connection quality in difficult conditions

A study by Cellular Insights Inc. found that Qualcomm's mobile modems perform better than Apple's…

11 hours ago

Tesla Warns Trump Administration of Chip Tariffs

Tesla has called on the Trump administration to exercise caution in imposing tariffs on imported…

11 hours ago

To better compete with OpenAI, Meta will split its AI team into two

Meta✴ will split its AI teams to better compete with OpenAI and Google, as well…

11 hours ago

The Order: 1886 Director Co-Founds New Studio — Atlantis Studio Aims to Conquer the Industry with Innovative Games

Ru Weerasuriya, co-founder of Ready at Dawn, which closed last summer, and creative director of…

11 hours ago

Review of the wireless speaker “Yandex Station Street”: Alice in the cities

To be honest, when I first saw the news about the release of the portable…

11 hours ago

Blacktail developers announce Davy x Jones — a shooter about the headless pirate Davy Jones in the afterlife of sailors

Polish studio Parasight, known for the folklore action game Blacktail about the young Baba Yaga,…

1 day ago