EDPB Report on ChatGPT: Data Accuracy Still Falling Short of EU Regulations

A much anticipated report on ChatGPT from the European Data Protection Board has found that the chatbot has made improvements in terms of data accuracy, but continues to fall short of the mark in terms of regulatory requirements. The EDPB task force is working with the EU’s national data watchdogs, and the full results of the report are thus being held back until some of those bodies complete ongoing investigations.

This course began when Italy temporarily banned ChatGPT last year over data privacy concerns, something that kicked off investigations by numerous other EU nations. The EDPB formed the dedicated task force to centralize the efforts of the data watchdogs in April of last year. Member states are looking to align their policy positions on chatbots, but are taking their time with the issue and are not looking to push rule-making as of yet.

Data watchdogs continue investigations as EDPB finds ChatGPT falls short of transparency principle

The report notes that ChatGPT is held to a particularly high standard by data watchdogs as its output is very likely to be taken as factually correct by its users, even if its parent company provides disclaimers against its expected data accuracy. It is also expected to not produce biased or “made-up” outputs, something it is still struggling with.

While each of the data watchdogs conducting investigations will eventually weigh in with their own views on ChatGPT’s lawfulness, the EDPB has provided a preliminary assessment of whether or not the LLM is currently in compliance with the terms of the General Data Protection Regulation (GDPR). The short answer is that it is not, specifically in regards to the GDPR’s established data accuracy principle.

Article 5 lists out a number of central principles of the regulation, data accuracy among them. The article establishes that data controllers have an obligation to ensure that the personal data they hold is accurate and up to date, and must take “reasonable steps” to ensure inaccurate data is erased or rectified.

The article also requires controllers to record the source of personal information that they collect. This has been a primary sticking point for ChatGPT and similar LLMs, as the way the models are trained virtually guarantee that this cannot happen. The models scrape huge volumes of data, rather indiscriminately, from public sources ranging from Reddit to whatever has been left out in the open on social media platforms. As the LLM looks for patterns rather than cataloging this information, it does not have a record of how it came to “believe” any particular fact that it reports to a user.

AI developers continue to struggle to find ways to ensure data accuracy

The data watchdogs have noted that not only does this “learning” model fail to generate the sort of paper trail required by the EU’s regulations, it is prone to end up outputting biased or faulty information. AIs that have been shut down for adopting racist views after training on public sources date all the way back to Microsoft’s Tay in 2016. This list also includes a Korean chatbot called Luda in 2021, Meta’s BlenderBot 3 in 2022, and Google’s Gemini image generation in February of this year.

ChatGPT faces both regulatory challenges and private lawsuits over the information it has output about individuals, and the fact that the information cannot be corrected even if data accuracy issues are proven. Several individuals have now brought cases against OpenAI after the company’s chatbot output false information about them, in some cases bad enough to be libelous. The company has been challenged on the basis of falling afoul of both Article 15 and 16 of the GDPR in this area; that the former requires that companies show an individual where they got the data about them they are holding, and the latter establishes a right to rectification if this data can be shown to be inaccurate.

One of the issues with ChatGPT’s data accuracy is that it seemingly cannot simply admit when it doesn’t have good or reliable information on something. A GDPR complaint brought by privacy crusader Max Schrems and his data watchdog group “noyb” notes that the chatbot was repeatedly prompted for Schrems’ birthday and delivered an assortment of wrong answers, never simply admitting that it did not have a clear source from which to answer the question.

Part of the issue with the birthday question may be that ChatGPT has processes in place to detect and filter personal information from its training data, at least for non-public figures. However, the EDPB report notes that OpenAI is not able to guarantee that all personal data is filtered from the public sources it plumbs. That creates the further possibility that this information could be regurgitated in response to a query.

OpenAI responded to both the data watchdog report and to the resignation of two high-level risk management leaders by forming a safety and security committee, which has just launched a 90-day initial evaluation period.

EDPB Report on ChatGPT: Data Accuracy Still Falling Short of EU Regulations

Data watchdogs continue investigations as EDPB finds ChatGPT falls short of transparency principle

AI developers continue to struggle to find ways to ensure data accuracy

Security Researchers: ChatGPT Vulnerability Allows Training Data to be Accessed by Telling Chatbot to Endlessly Repeat a Word

OpenAI Attributes ChatGPT Outages to a DDoS Attack Claimed by a Russian Hacktivist Group

ChatGPT – IP and Privacy Considerations

Over 200,000 Compromised OpenAI Credentials Available for Purchase on the Dark Web

FTC Investigation Into OpenAI Opened Over Potential Consumer Protection Law Violations

The Risks of Using ChatGPT-Like Services

CISA and NSA Release “Best Practices” Guidance for Coordinated Vulnerability Disclosure Programs

State-Sponsored Russian Hackers Exploit Vulnerable Routers to Compromise Critical Infrastructure

New White House AI Platform to Foster Public-Private Coordination on Security Vulnerabilities