A vulnerability has been discovered in ChatGPT that allows a potential attacker to insert false information about a user into the chatbot’s long-term memory using malicious requests – this opens access to the victim’s personal data. OpenAI initially considered the discovery, made by cybersecurity expert Johann Rehberger, to be a minor threat and quickly closed the investigation.
When exploiting the vulnerability, the attacker gains access to the long-term memory of correspondence – OpenAI began testing this function in February and released it to the public in September. ChatGPT’s memory stores important information from conversations with the user and uses it as context in all future conversations. The large language model knows about the user such information as his age, gender, beliefs and much more, so some data does not have to be entered in each subsequent correspondence.
Reiberger, however, discovered that using indirect injection inside a request, it is possible to create artificial entries in the ChatGPT memory – such an injection could be emails, blog posts and electronic documents. A researcher has demonstrated how ChatGPT can be tricked into believing that the target user is 102 years old, lives in The Matrix, and believes the Earth is flat. In all subsequent conversations with the user, the AI was based on this false data. False memories were implanted using files in Google Drive and Microsoft OneDrive storage, downloading files and browsing websites, including Bing.
In May, the expert announced his discovery to OpenAI, but the company closed the ticket the same month. A month later, Reiberger submitted a second appeal, to which he attached a hacking model – it forced the ChatGPT application for macOS to send all correspondence between the user and the chatbot to a server chosen by the potential attacker. To do this, it was enough to tell the AI to open a link through which a malicious image was downloaded – after which the hacker received all the logs of dialogue between a person and a machine. Data extraction continued even when a new conversation was started.
OpenAI subsequently partially patched the vulnerability by blocking the ability to exploit the memory function as a vector for data extraction. However, according to Reiberger, the ability to do this using injections as part of malicious requests still remains. ChatGPT users are advised to mark the time of sessions during which new materials are added to the AI’s memory, and also regularly check the memory for injections from unreliable sources. OpenAI has prepared instructions for managing the memory function.