Around March 20, users took to various social media platforms to report an odd ChatGPT bug. Lists of unfamiliar chat titles were suddenly appearing in the website’s sidebar. It turns out that these were the chat histories of other users, apparently being displayed randomly to other people.
ChatGPT shut the service down from 1 AM to 10 AM on March 22 to fix the issue. While it was relatively limited in terms of exposing personal information, the issue demonstrated what could be in store in terms of future data breaches of AI tools.
ChatGPT bug demonstrates risks of AI data storage
The ChatGPT bug apparently did not allow users to click through the displayed titles into the actual chat histories. However, a long enough list of chat titles could potentially reveal someone’s identity along with sensitive personal information about them.
It is unclear how many users this happened to, but ChatGPT head Sam Altman claimed it was only a “small percentage.” Altman also promised that a “technical postmortem” of the issue would be released. A follow-up post indicates that 1.2% of the ChatGPT Plus subscribers who were active during a specific nine-hour window may have had their accounts breached to a much greater degree: payment information (the last four digits of credit card numbers), real names, home addresses, email addresses, and credit card expiration dates.
The ChatGPT Plus exposures were apparently caused by subscription confirmation emails being sent to the wrong users during a limited time window on March 20. In terms of the ChatGPT bug that displayed chat histories to the wrong people, OpenAI indicated that the Redis caching system they use was to blame. The system uses a shared pool of connections to distribute request loads, and when a user ends one connection it is passed to another. Canceling these requests at a particular time could introduce a bug that gives the new users the chat histories of the prior user.
OpenAI says that it has now fixed the ChatGPT bugs and taken an assortment of precautions to prevent similar issues in the future: adding redundant checks to the Redis cache system, improving logging and expanding the scale of the Redis cluster among them.
Exposure of chat histories raises questions about AI security
OpenAI, which was founded in 2015 as a non-profit before transitioning to a for-profit model in 2019 (and entering into a major business partnership with Microsoft), has established a unique means of handling private user data. For those that use the free chat or art generation tools, the company promises to remove personally identifiable information but otherwise will use all input to further train its models. Only paid subscribers are promised that their queries will not be used for training.
The trouble with this model is that it is not entirely clear to what extent “personal information” is removed from user inputs. The company has said that it keys in on specific types of personal information that are in an easy-to-recognize format, such as identification and phone numbers. However, the privacy policy also indicates that a great deal of personal data “may be” collected. The ChatGPT bug that exposed chat histories demonstrates a way in which people can be identified even if the system is not sitting on direct identifiers, from a collection of inputs that may have listed personal interest/circumstances or names that create a profile that can identify them (something akin to the “fingerprinting” techniques that digital advertising systems use to track individual data subjects).
This raises a variety of concerns for consumers, not the least of which is what is happening right now with potentially sensitive personal records or files being plugged into the free version of ChatGPT by organizations looking to cut their workload (or payroll). Based in the US, where there is no federal data privacy law (and in this case only some direct protections available to California residents), OpenAI and Microsoft have little regulatory impetus to provide transparency to the public.
While this is the first known issue involving a leak of personal information (or chat histories), this is not the first ChatGPT bug. While the AI often produces very impressive results, it also has a substantial rate of confidently delivering wrong answers. Guardrails placed by the developers to limit its potentially destructive capabilities have also proven easy to circumvent with creative prompts. The developers are working on a digital watermark system to combat potential plagiarism and fraud, but numerous technical researchers believe that such a system will inevitably generate huge amounts of false positives and negatives, such that it might ultimately prove to be functionally useless.
Despite the project’s early development status and potential litany of ethical and security concerns, businesses are rushing ahead with attempts to implement it. Microsoft is foremost among these, adding the technology to Bing in an attempt to gain ground on Google’s virtual monopoly on web search.