ChatGPT is at least temporarily offline in Italy, as the Italian DPA has concluded that the AI tool may have violated data privacy laws during a recent data leak. OpenAI has been given 20 days to address privacy concerns, or it may face substantial fines.
Italian DPA moratorium on ChatGPT stems from data leak
The specific incident that caught the attention of the Italian DPA was the recent data leak at ChatGPT, in which a request staging system accidentally made the chat titles of some users available to others. Additionally, some amount of the paid tier of ChatGPT users had personal information exposed, including payment information, when a subscription email system similarly malfunctioned.
However, these concrete privacy concerns seem to have opened the door to further questioning by the Italian DPA. The agency is now looking for more information on how the AI gathers the data it uses to answer questions, and exactly what it is retaining (and for how long). It also seeks to know how the platform intends to protect minors from illegal data collection and exposure to potentially dangerous chat dialogs, given that it does not have any kind of an age check system in place.
OpenAI now has under three weeks to respond to these privacy concerns, or it could face fines from the Italian DPA of up to $21 million or 4% of its annual turnover. The company says that it has patched the bug that caused the prior issues, but it is unclear if it has met all of its obligations under European data privacy law. OpenAI says that it believes ChatGPT meets all of the requirements of the EU’s General Data Protection Regulation (GDPR).
Privacy concerns mounting as ChatGPT capabilities increase
The Italian DPA is the first government regulator to issue a ban on ChatGPT, but it is far from the first body to raise the possibility or to bring up privacy concerns.
In the US, the Center for AI and Digital Policy has petitioned the Federal Trade Commission (FTC) to put a moratorium on the release of anything more powerful than the present ChatGPT-4, claiming that these advanced tools would create the possibility of broad-scale mass surveillance. The Center essentially wants development frozen until an investigation into the public impact of these tools can be conducted and a discussion about appropriate regulation can begin.
A draft of a bill to regulate AI chatbots with formal guidelines is up for consideration in the Massachusetts state legislature; Sen. Barry Finegold used ChatGPT to write portions of the bill to support his call for regulation, likening it to Facebook in its early stages before users were aware of the potential dangers. Virginia’s Republican Governor Glenn Youngkin has called for school districts to ban it, and a number of major banks have banned employees from using it at work.
OpenAI has also opted to cut off certain countries from access to ChatGPT: China, Russia, North Korea and Iran. China-based companies have recently unveiled several competing products, however; Baidu’s Ernie Bot, Yuanyu Intelligent’s ChatYuan and a bot developed by a team of Shenzhen-based engineers called Gipi Talk among them.
The European Commission has been ahead of most in addressing AI safety and privacy concerns, opening discussions on regulations with a set of draft rules put forth two years ago. However, progress has ironically been slowed due to the speed with which these commercial AI products are improving. EU lawmakers and individual countries are having trouble coming to a consensus on a number of points, such as assigning risk levels to individual AI products and maintaining competitive positioning in the market.
As far as potential extension of the Italian DPA’s ban, the Irish DPC says that it is discussing the matter with its counterpart and would coordinate any potential future actions with all of the EU’s other national regulators. The Italian DPA has been the most active of the individual national regulators as of yet, as this is not its first ban of AI software; in early February it banned the “virtual friend” chatbot Replika, which was targeted at children. In addition to privacy concerns, the app was banned over its tendency to excessive flirtatiousness and to sending messages that are inappropriate for minors.
In spite of its incredible range of possibilities, tech leaders are not all on board with AI. Some are as uncomfortable as government regulators and watchdogs are, perhaps even more so. Elon Musk has been one of the most outspoken about the risks of AI, and last week his nonprofit Future of Life Institute coordinated an open letter signed by figures such as Steve Wozniak, Andrew Yang, and Turing Prize winner Yoshua Bengio.
Michael Rinehart, VP of Artificial Intelligence for Securiti, notes that security and privacy by design approaches are likely the way forward for AI development: “The decision by the Italian data-protection authority to ban ChatGPT underscores the importance of taking a privacy-first approach to training Generative AI models. Data privacy is no longer just a data system concern; deep learning models such as GPT3 (175B parameters) and GPT4 (claimed to have 100T parameters) now have capacities rivaling small storage systems. If trained on sensitive data such as customer chats, transactions, or medical history, it may recite that data with just the right prompt. As evidenced by the array of publicly-available prompt engineering attacks against ChatGPT, ad-hoc techniques to protect privacy “at the prompt” can have easily exploitable weaknesses. Despite these challenges, enterprises can reap the benefits of Generative AI in a privacy-preserving manner through proper management of its training data.”
“First, enterprises should ensure that the sensitive data used as the basis for training are cataloged and indexed so that data subject requests (such as erasure) can be applied to them. Second, personal data should be either masked or tokenized prior to training. Finally, high-quality synthetic datasets that model the characteristics of real sensitive datasets should be considered. Generative AI for synthetic data are now capable of creating datasets that closely mirror real datasets without exposing individual personal data elements. If differential privacy is further applied, then the synthetic data is nearly guaranteed to protect the identities of the individuals who comprise the real dataset,” added Rinehart.