A Google AI model will be facing General Data Protection Regulation (GDPR) scrutiny as the Irish DPC questions its transfers of personal data across multiple borders.
The Irish Data Protection Commission (DPC) has announced that Google’s Pathways Language Model 2 (PaLM 2) will be subject to a cross-border statutory inquiry under Article 35 of the GDPR, which stipulates that new technologies that process EU user data may be required to submit a Data Protection Impact Assessment (DPIA) if there is a possibility of “high” risk to the rights and freedoms of individuals in the bloc.
“Foundational” Google AI model may have skipped a required GDPR step
PaLM 2 is the core of a family of LLMs that drive Google’s language, math and reasoning AI products. This model focuses more on facts and research, as opposed to Google AI model LaMDA which is intended to simulate natural conversations.
Article 35 of the GDPR stipulates that a DPIA is required when personal data is processed in more than one member state, or is likely to impact citizens in multiple member states. There is an extra impetus on new and relatively untested technologies that have the potential to impact rights and freedoms, with the assessment centered on demonstrating that any data processing is “necessary and proportionate” and ensuring that required safeguards are in place.
A spokesperson for Google said that the company takes its GDPR obligations seriously, and will cooperate with the DPC to answer their questions. Google is fresh off a loss in the Court of Justice of the European Union, which decided to throw out its appeal of a 2.42 billion euro fine assessed for discriminatory and unfair practices in its price comparison shopping service.
Google AI model among many limited in use of EU data
Many of the major players in the AI space are backing out of training in the EU, primarily due to GDPR regulations and uncertainty about penalties. Elon Musk’s X has already headed off a potential court case by making promises to strictly limit the use of EU citizen data to train its own AI, and Meta announced several months ago that it is indefinitely pausing AI training in the bloc and will delay the regional launch of Meta AI.
But user data is not the only concern causing big tech firms to shy away from the EU. Separate from the DPC’s action against the Google AI model, Irish media regulator Coimisiún na Meán warned Google-owned YouTube and several other major social media companies that they are facing potential investigation for compliance with the Digital Services Act (DSA).
Actions against ChatGPT in the region in 2023, just a few months after the service became available to the general public, set the tone for how EU members would handle AI regulation. Italy was quick to hit the LLM pioneer with a temporary ban over concerns about its use of personal data and whether it was adequately protecting minors that might enter sensitive information. OpenAI was able to navigate out of that situation in about a month, but not before it sparked heightened scrutiny and the threat of more bans by regulators in a number of other EU countries.
A great deal of confusion remains about what AI systems do with personal information that users enter during conversations, and whether it might resurface somewhere else. Most of the major companies say that they use automated tools to recognize and filter out potential personal information before it is incorporated into training data, including Google AI model Gemini. The privacy statement for that model indicates that Google attempts to remove email addresses and phone numbers, but also warns that “confidential information” or “data that you wouldn’t want a reviewer to see” should not be included in conversations with the LLM. It also says that Google may use these conversations to improve other products or technologies, raising the question of it being stored outside of that particular AI model’s training set for more general access across the company.
Google AI models reportedly retain conversations for up to three years, and there are no security or privacy settings that will prevent this from happening. Users must thus assume that anything they type in could appear in unexpected places in the future. This extends to other models, such as the ones belonging to Meta, that have similar privacy policies.
Gemini is currently the largest and most capable of the Google AI models, and considered the flagship assistant product. But the company now has about a dozen different variants keyed to particular tasks, such as CodeGemma focused specifically on coding assistance and the SecLM family designed for cybersecurity needs. The company’s range of AI faces more scrutiny than some other companies because of its potential connection to its internet-spanning ad network, search service, Android operating system, Chrome browser, G Suite tools and other aspects of the company that collect a staggering range of user information.