Data that was publicly available online, even momentarily, can remain in the possession of generative AI online chatbots like Microsoft Copilot for a long time after access to it has been removed, according to research from Israeli cybersecurity company Lasso, which specializes in emerging generative AI threats.
Image Source: Windows/unsplash.com
The issue affects thousands of once-public GitHub repositories from a number of major companies, including Microsoft, that have since been closed, Lasso told TechCrunch.
According to Lasso co-founder Ofir Dror, the company discovered that content from its own GitHub repository was appearing in Copilot because it was indexed and cached by Microsoft’s Bing search engine. The repository was briefly open by mistake and is now private. Attempting to access it on GitHub results in a “Page not found” message.
«”On Copilot, oddly enough, we found one of our own private repositories,” Dror said. “If I were browsing the web, I wouldn’t see this data. But anyone who asks Copilot the right question can get it.”
In response, Lasso conducted an investigation that pulled a list of repositories that were publicly accessible at some point in 2024 and identified those that have since been removed or made private. Using Bing’s caching engine, the company found that more than 20,000 private GitHub repositories from over 16,000 organizations are still accessible through Copilot. This includes Amazon Web Services, Google, IBM, PayPal, Tencent, and Microsoft.
Dror said Lasso contacted all companies that were “seriously affected” by the data breach and advised them to rotate or revoke any compromised keys.
Lasso notified Microsoft of its findings in November 2024, but the software giant told it it considered the issue to be of “low severity,” saying the caching behavior was “acceptable.” Microsoft said it would no longer include Bing cache links in search results as of December 2024.
However, Lasso claims that even though the caching feature was disabled, Copilot still had access to the data, even though it was not reflected in web search results.