A cautionary tale has emerged from Samsung’s internal operations as employees were found to be feeding sensitive data to ChatGPT, as a means of automating portions of their jobs.
The Economist Korea reports two cases of programmers feeding source code to ChatGPT to check and optimize it, and a third giving it a recording of an internal meeting to convert it into a presentation. ChatGPT’s user guide does warn that such submissions could be fed into its training data, but the message is apparently not getting through to some users.
Submissions of sensitive data prompt some companies to consider their own internal chatbots
ChatGPT (and related image generation service DALL-E) will put data that is shared into its training model unless an organization fills out an opt-out form. In the case of Samsung, it now has internal source code from the semiconductor division as well as a recording of an internal company meeting.
The popular chatbot does not presently allow users to selectively remove information from its stored files; the only way to get it out is to delete the account that submitted it, but it can take as much as a month for this to be fully processed. Sensitive data that is plugged in may not be limited to just ChatGPT, but may also be shared with OpenAI’s “other consumer services,” according to the terms of its data usage policies.
Melissa Bischoping, Director of Endpoint Security Research at Tanium, summarizes exactly what should be considered “sensitive data” in terms of ChatGPT: “As a general rule, if you wouldn’t share the content you’re typing into ChatGPT with someone who works for your company’s direct competitor, then do not share it with ChatGPT. Even if you think the data you’re sharing may be generic and non-damaging, you should review any relevant policies with your legal and compliance teams to clear up any doubt. Companies have rapidly started rolling out acceptable use policies for AI/ML tooling if they didn’t have them already, so leaders should prioritize education, Q&A, and crystal-clear understanding of the risk and limitations. The issue with sharing data with ChatGPT lies in the fact that the creators can see the data you entered, and use it to understand how the model continues to train and grow. Additionally, once that information is now part of the next model, it may be possible to talk the AI into giving up that information to another party (much like how previous models have been talked into discussing their limitations, safeguards, and internal codenames).”
“Large language models are not going anywhere, but with great power comes great responsibility. Educate employees on what information is highly sensitive, how to treat that data in regards to humans or computer systems, and consider investing in a non-public model to use for your intellectual property’s protection,” advised Bischoping.
There does seem to be high awareness of the risks of plugging sensitive data into ChatGPT, at least according to very early research. A recent study conducted by data security firm Cyberhaven found that only 3.1% of workers are leaking sensitive company information to the chatbot. However, this is also with only about 8.2% of the workforce using ChatGPT at work and only 6.5% pasting any sort of company information into it. The study found that just under 1% of employees were responsible for over 80% of the “egress events” involving sensitive data that should not have gone into the chatbot, indicating that certain job categories that are not as cybersecurity-aware may be much more of a risk for these sorts of leaks.
All of this has Samsung, no doubt along with an assortment of other tech companies that are appropriately positioned, considering building its own internal AI chatbot for employees to use. This would allow organizations to not only keep sensitive data from leaking out, but to monitor what employees are “outsourcing” to the chatbot in terms of work duties. This will not be a common or easily accessible option for companies anytime soon, however, as ChatGPT is estimated to cost millions of dollars in computing power per day. This could be scaled down for specific purposes, but at absolute minimum is still likely to cost hundreds to thousands of dollars per hour to run.
In the meantime, Samsung is limiting ChatGPT upload capacity to 1024 bytes per person and is investigating incidents of employee use that involve sensitive data. It has also sent out a warning about use of the chatbot to all employees.
Governments and private industry increasingly cautious about ChatGPT
Self-imposed bans and limitations on ChatGPT are becoming increasingly common among companies. Samsung itself had one in place, until just three weeks before the incidents involving sensitive data began.
Conventional thinking has been that businesses would rush to embrace ChatGPT and its derivatives, given the potential for productivity increases (and, more cynically, the prospect of shedding payroll costs). Major players are showing caution, however, as a string of issues has given ample reason to pump the brakes on integration into the workplace.
Krishna Vishnubhotla, Vice President of Product Strategy at Zimperium, summarizes the discussions that businesses are currently having about adopting large language models as a work tool: “Incidents like the recent data breach underscore the need for enterprise IT and security teams to understand how any information submitted to generative AI tools will be collected, anonymized, used, and retained in the short and long term. It’s advisable for enterprises to engage directly with vendors to ensure that the solutions are configured and deployed correctly, with appropriate safeguards in place, before making them available to the entire workforce. As generative AI tools like ChatGPT are still evolving and in their infancy, there will inevitably be new challenges that we cannot predict today. Nonetheless, enterprises can start by providing guidance on best practices and cautionary measures when using these tools. Such guidance should continue to evolve as AI becomes an increasingly ubiquitous part of our lives. By doing so, organizations can minimize the risks and enable their employees to benefit from generative AI technologies in a safe and secure manner.”
Major companies that have banned or restricted ChatGPT in the workplace include Verizon and Amazon, and the financial sector has been heavily against it thus far: it is banned entirely at JP Morgan Chase, Wells Fargo, Bank of America, Goldman Sachs and Deutsch Bank among others. Recent Bloomberg reporting has found that nearly half of all organizations are drafting policies for acceptable workplace use of chatbots, if they have not implemented them already.
The pasting of sensitive data into ChatGPT is one of the major reasons for these policies, with executives spooked by output that has appeared to include confidential company information. Unthinking input of this sort of internal material could do everything from allow chatbots to identify patient health conditions to the world, to facilitate corporate espionage by filling in requesters on strategic plans and priorities. But another aspect is that it is far from perfect, often getting things wrong but delivering answers in such a way that it sounds absolutely certain of itself (this got it banned from StackOverflow as users generated reams of faulty code with it).
Italy is the first nation to ban ChatGPT, citing General Data Protection Regulation (GDPR) compliance concerns. The country’s data protection watchdog said that the chatbot had no means of verifying the age of users, who might then provide it with sensitive categories of personal information. It also noted that the collection and storage of potentially personally identifying information is not transparent and does not necessarily meet regulatory requirements. The door has been opened for ChatGPT to return to Italy as early as the end of this month, however, if OpenAI implements consent and age verification checks as well as a means for users to see and delete their stored data.
John Harden, Senior Product Marketing Manager of SaaS at Auvik Networks, provides some thoughts on how employees can be prepared for the use of chatbots in the office: “In today’s SaaS world, we are reliant on the vendor to maintain data integrity within their own platforms. As AI tools become more streamlined, the risks will go up, simply as a correlation to adoption. This is why it is so critical to teach people about proper practices using these tools. In many cases, when you’re not paying for a product, you yourself are the product. In the case of these AI tools, as you input information into these models and systems, you are providing data for the product to use for its next version.”
“Employees should be cognizant on what type of data they are inputting into these AI systems and be aware that the data put into them can expose confidential and sensitive company information, and put their company’s security posture in jeopardy. IT and security leaders need to hold sessions educating employees on these tools and how they work. Without underlying understanding on how they work, such as how the data they’re putting in is used to further train the models, employees may not understand the true risks they’re exposing their organization to. ChatGPT may have eyes on it today, but rest-assured, more AI-powered technologies will emerge and it’s critical we train employees on this next wave of AI-powered SaaS and the risks shadow IT brings to an organization,” recommended Harden.