Data moving in digital space showing federal data in AI software

Inside Sources Accuse DOGE Team of Feeding Federal Data Into AI Software

Two anonymous insiders speaking to Washington Post reporters have claimed that Elon Musk’s DOGE team has been feeding federal data into unspecified AI software during their audit of the Department of Education (DOE).

Rumors have been swirling about an executive order being prepared to dismantle the DOE, and Musk tweeted on February 7 (apparently in jest) that the department “no longer exists.” Given that it is a federal agency created by Congress, Trump cannot abolish DOE with an executive order but can potentially use his authority to severely disrupt its function. The AI software has reportedly been used to search information about agency spending on payroll and programs, something that could potentially violate federal privacy or security regulations if sensitive personal information has found its way to the servers of third-party companies.

DOGE facing criticism of its handling of federal data during clearcutting campaign

DOE information that may have been fed into AI software includes sensitive department financial data as well as personally identifiable information for staff members who manage grants. This may have happened as DOGE members used AI tools to comb through federal data, with department staffers described as “lower-level” ordered to allow Musk’s team to access internal financial information.

The inside sources say that DOGE members are using Microsoft Azure as a central tool in its processing of federal data, but that it is not necessarily using Azure’s AI software directly. The team may be accessing other AI tools via Azure, but the sources did not specify which.

The DOGE team’s early work with the DOE and the Treasury is also establishing a model that is to be applied through presumably all government agencies, which could mean all manner of additional federal data being run through AI software. All of the work involving AI appears to be centered on the General Services Administration’s Technology Transformation Services (TTS) department, an entity created during the Obama administration to improve public transparency into federal government spending.  Former Tesla software engineer Thomas Shedd has been appointed as the new head of TTS and has told its employees that it will be used as something of a central clearinghouse for analysis of government contracts by AI software.

Madi Biedermann, a communications strategy specialist and Trump appointee speaking on behalf of the DOE, says that there is nothing “inappropriate or nefarious” going on and that the AI software is only being used to scan for inefficiencies that are likely targets for budget cuts. There is not yet any firm evidence available to the public in terms of what federal data DOGE may have fed to AI or to which services it was sent, but the primary point of concern at the DOE is access to the agency’s student loan databases. That could include Social Security numbers and payment information for tens of millions of borrowers.

DOGE puts AI software privacy threats back into the spotlight

Regardless of what questionable things DOGE might or might not be doing with federal data, the level of concern exists because of workplace LLM issues going back to the launch of ChatGPT in late 2022. Organizations are in a natural hurry to deploy AI to streamline work and improve efficiency, and employees will also seek “creative” ways to make their jobs easier, but the information risks have not necessarily been apparent.

Anything passed along to AI software could very well take up permanent residence in its training data, and then reappear unpredictably in response to future queries by other parties. There are already several documented instances of this happening. In late 2023 a breach of ChatGPT was discovered that caused it to sometimes spit out random personal information when told to repeat a word or phrase endlessly. And in early 2023, researchers were able to prompt Stable Diffusion and Google’s Imagen to reproduce copyrighted images and pictures of real people that had been absorbed into their training sets, when the models were supposed to be limited to generating original images.

This has led to workplace bans on AI software by a number of companies. Many of the major players in the financial industry have done so, it is reportedly common at law firms, and other individual companies such as Amazon and Samsung have restricted use after reported instances of employees feeding sensitive company information into them. Despite this, studies tend to show that AI use is very common in professional workplaces.

Removing sensitive data from AI software is also not as simple as going into a database and deleting it. AI developers jealously guard the internal workings of their models from the public, in the interest of keeping competitors from getting a leg up, but consistently say that individual bits of data cannot really be tracked down once absorbed into a training set. Instead, the model might have to be scrapped entirely and restarted with a new training set if it becomes too “tainted” in this way. Items like sensitive federal data essentially have to be recognized and filtered out of permanent storage during the input process, but again the public has limited visibility into the techniques in play or how consistent and effective they are.

Casey Ellis, Founder at Bugcrowd, notes that a team focused on efficiency may well have a valid use for the most efficient available data processing tools, but that the sensitivity of the reams of federal data involves makes proceeding with transparency and respect for data subjects an absolute imperative: “On one hand, it’s a pretty logical use of AI: Using AI to interrogate raw, disparate, and presumably vast datasets to speed up “time to opinion” makes a lot of sense on a purely technical and solution level. On the other hand, of course, it raises some serious questions around privacy and the transit of sensitive data, and the governance being applied to how data privacy is being managed, especially for personnel files, project/program plans, and anything impacting intelligence or defense.”

Satyam Sinha, CEO and Co-founder at Acuvity, adds: “Given the extensive use of GenAI services by countless enterprises, the use by government agencies does not come as a surprise.  However, it’s important to note that GenAI services represent a completely new risk profile due to its ongoing rapid evolution.  The risk of data exfiltration across GenAI services is very real, especially given the value of such sensitive government agencies’ financial data to our adversaries and bad actors.  While many providers adhere to requirements such as GovCloud and Fedramp, not all providers do. We have to exercise an abundance of caution and an additional layer of security.”

J Stephen Kowski, Field CTO for SlashNext, also notes that cybersecurity practices should be part of this commitment to transparency: “The processing of sensitive government or any organization’s data through AI tools raises important cybersecurity considerations, particularly since this data includes personally identifiable information and financial records from the Department of Education. Modern AI-powered security controls and real-time threat detection should be standard practices when handling such sensitive information, especially given the potential for data exposure to foreign adversaries or cybercriminals. Organizations working with government systems should implement comprehensive security measures that combine AI safeguards with human oversight to protect sensitive information while maintaining operational efficiency.”