Screen of the laptop showing source code leak of AI agent

Taking Stock of the Anthropic Source Code Leak: AI Agent Compromise Signals Security Issues, “Claude Copies” Ahead of Massive IPO

Likely just a few months ahead of what is expected to be one of the largest tech IPOs in history, Anthropic is now dealing with a damaging source code leak that threatens to compromise its “Claude” AI agent in several different ways.

“Human error” during a software update is the attributed reason for the leak, which saw 2,000 internal files and some 500,000 lines of source code exposed and quickly copied to assorted GitHub repositories. While Anthropic has attempted to contain the damage with takedown requests, the AI agent’s code unsurprisingly spread like wildfire and is now essentially available to anyone willing to look for it. One early development has been promises of the source code as bait via malicious advertisements, but a number of other security and business competition risks loom at perhaps the worst possible time for the company.

Mistake in update to AI agent exposes internal workings

The leak came via a recent update to the AI agent, which accidentally packed in a file meant for internal use only. This file pointed to an unprotected archive containing thousands of further internal files along with hundreds of thousands of lines of source code. This was quickly spread by a viral post on X that drew tens of millions of viewers within 24 hours, and numerous GitHub repositories quickly became host to copies and slightly rewritten versions.

Anthropic initially attempted to contain the damage by issuing copyright takedowns for these GitHub clones, but appeared to be backed off by the sheer number. It initially issued DMCA requests for over 8,000 of these repositories to be taken down, but later revised this to just one copy with 96 fork URLs after numerous hosts of legitimate Claude Code forks complained of their repositories being wrongfully removed.

Anthropic assured users of its product that no credentials or sensitive information were exposed by this leak, but the company has likely suffered serious damage from the incident. It hit at a uniquely bad time, not just ahead of a planned IPO but as Claude was experiencing a substantial boost in popularity thanks to a hard-line stance about its use for military and surveillance purposes. It recently took the top spot among downloads of free apps on the Apple app store.

Source code leak was one of several recent security issues

The development presents several different risks to Anthropic. For one, and one that is already being seen on GitHub, it allows competitors to both analyze Claude’s inner workings and create their own derivatives. Another is the risk of security issues and vulnerabilities being detected from review of the source code. While Claude user information appears to be safe for now, this does create the possibility that attackers will find new ways to breach the AI agent in the near future.

Jacob Krell, Senior Director: Secure AI Solutions & Cybersecurity, Suzu Labs, adds some insight into the specific risks in this area: “The significance of this leak is in what the code reveals about AI agent architecture. The leak exposed approximately 512,000 lines of TypeScript across roughly 1,900 source files. Developers and researchers who have analyzed the source have since documented the scale of what Anthropic built around the model. The code contains what analysts describe as 44 feature flags for unreleased capabilities, approximately 40 permission gated tools, a multi agent coordination system, a persistent autonomous daemon mode, a layered memory architecture, defenses against competitor model distillation, and granular attribution tracking for AI versus human code contributions. The leaked code strongly suggests that the bulk of Claude Code’s production capability comes from orchestration, tooling, memory, and permission layers built around the model.”

“The multi agent coordinator mode, as documented in the leaked source, illustrates where the engineering complexity lives. The code describes a system where Claude Code operates not as a single model session but as a supervisor managing a fleet of worker agents executing tasks in parallel. In the leaked architecture, the coordinator does not directly edit files, run commands, or read code. All implementation goes through workers. Verification is handled by what the code describes as a separate adversarial agent that must confirm the output works before the task can be marked complete. In effect, this is zero trust architecture applied to AI agents, with the orchestration system enforcing verification independently of the model,” Krell explained.

“This leak also serves as a proof of concept for the rest of the industry. The engineering gap between a frontier research lab and a commercial competitor appears narrower than many assumed. The architectural patterns documented in the leaked source are well structured and reproducible in principle. A competent engineering team can study the coordination strategies, memory approaches, and tool integration designs and adapt the approach using any available foundation model. The model layer is swappable. The orchestration patterns are the transferable knowledge. What Anthropic built behind closed doors is now visible, and for anyone questioning whether a smaller team could build a credible AI coding agent, the architectural proof of concept is now public,” added Krell.

Competitors are getting a peek at some features still in development thanks to the source code leak. One is an always-on AI agent called “Kairos” that would not only maintain presence when the Claude Code terminal window is not open, but would remember user commands and behavior across sessions to create a profile. An “AutoDream” system would also direct Claude Code to “dream” when not active, passing over the day’s information to consolidate it and prune unnecessary memories. And Anthropic appears to be planning its own revival of Microsoft’s infamous Clippy via the planned “Buddy” feature, which would provide ASCII art creatures that sit near the user prompt and sometimes offer suggestions in a speech bubble.

The source code is also already providing security analysts with insights that are not necessarily helpful to the company’s reputation. Some initial reviews suggest that Claude has persistent telemetry that is on by default unless the user is engaged with certain third-party providers that disable it automatically. This telemetry can phone home with user IDs, session IDs, app versions, platform, terminal type, Organization UUIDs, account UUIDs, email addresses and currently enabled feature gates. The AI agent also saves and uploads a copy of every file that it looks at, which can be retained by Anthropic for up to five years if users are opted in to sharing data for model training (and will generally be saved for at least 30 days if not). And if users publish code generated by Claude Code to public code repositories, it will attempt to hide the fact that AI authored it.

Michael Bell, Founder & CEO, Suzu Labs, expands on the risks in greater technical detail: “The finding that matters most for government and defense: the default telemetry collects device IDs, session data, email, org UUID, and process tree information on startup before the user types anything. Environment flags can escalate collection to include full prompts, file contents, bash command output, system prompts, and entire conversation transcripts sent to commercial endpoints. The code confirms FedRAMP OAuth paths to [claude.fedstart.com], meaning government deployments share the same codebase. Whether hardening was applied before those deployments is unknown, but the telemetry infrastructure is baked into the foundation. The Pentagon designated Anthropic a “supply chain risk” in March. This is what that risk looks like in code.”

“The engineers documented their own attack surfaces in comments. Prompt-injected models can exfiltrate secrets via GitHub CLI URL paths. Leaked GitHub Actions tokens enable “repo takeover” and “supply-chain pivot.” Bash parsing ambiguity allows commands to execute while hidden from security validators. They built mitigations, but the comments confirm the attack surfaces exist. The AI safety company with a $380 billion IPO target acquired Bun, whose known source-map-in-production bug was filed publicly and left open while the product shipped to millions of developers. Their operational security posture is a .npmignore file that nobody checked the second time around,” added Bell.

Attackers have been quick to leverage the source code leak in a much more simple way: taking out search ads that promise to lead the user to it, only to hit them with infostealer malware instead. Threat actors had already been running this scam with ads for guides purporting to allow the target to install a free version of Claude Code, but it is now enhanced by seeming more legitimate in the wake of the leak.