Over 100 AI Coding Agents Taken Over Via New "Agentjacking" Attack

A new type of attack that takes over AI coding agents has proven to be devastatingly effective in testing, successfully compromising several Fortune 500 companies and at least one in the Fortune 100. The attack is highly concerning as it does not even involve any real hacking; just one well-crafted fake bug report, and the AI agent will start executing commands within it without requiring any further authorization or input.

AI coding agents susceptible to “Agentjacking” instructions buried in Sentry DSN error events

Researchers with Tenet Security have documented this attack, and report that it is unique to AI coding agents (rather than standard chat and search LLMs) due to their processing of error reports. However, over 100 of these agents were found to be vulnerable including Claude Code, Codex and Cursor.

The researchers scanned public Sentry APIs and found at least 2,388 organizations exposed in this way. More concerning, the researchers reported successful executions of this attack at Fortune 500 companies in direct testing. The biggest fish in this pond, an unspecified Fortune 100 company worth $250 billion (which points to several of the biggest names in retail, food and banking as possibilities), reportedly had all of its AI coding agents taken over by one successful “agentjacking” injection.

It is unclear if this technique has been successfully exploited in the wild as of yet. However, the researchers report that Sentry has said that it “technically” cannot be fixed at the root and is advising that middleware is the only way to defend against it. Tenet Security has published an open source drop-in config it calls “agent-jackstop” specifically for Claude Code and Cursor that reduces the risk from this attack class, but there is not yet a clear path to full remediation of the issue.

Agentjacking attack puts authorization tokens and private repositories at risk

As mentioned, this attack takes almost no real hacking ability to pull off. It’s predicated on pulling a public credential that can readily be found in a web site’s JavaScript source code, which can be used by anyone to get Sentry to accept an error event.

The Sentry MCP server, as a trusted source, passes the tainted error report on to the AI coding agent. The agent will open it, believing it is an unresolved error that needs to be addressed. Instead, it is getting the malicious instructions packaged inside. The researchers mention that leaving directions for the AI to not trust what it was reading had no effect; it would always execute the malicious instructions in the error report anyway. Sandboxing the coding agents also did not prevent them from reading and successfully acting on the malicious instructions.

In addition to simply inspecting a website’s public-facing JavaScript code, the necessary Sentry DSN credential can also be found via Censys searches for ingest.sentry.io in HTTP bodies or a GitHub code search. This is the only credential required to post a tainted error report to Sentry’s ingest endpoint. As long as the injection is structurally identical to Sentry’s own MCP system template, it will not be questioned by the AI agent. The researchers also note that this attack type is very easy to scale, with one well-crafted injection able to be simultaneously deployed to thousands of different coding agents all over the world and a very high (85%) rate of success in testing. This also worked against a broad variety of setups, from Windows and MacOS to AWS and GCP.

The issue is not one that can be readily patched out by either Sentry or AI developers. It is simply the core way the system was designed to function running into a major conflict, not having adequately anticipated AI coding and an agent possessing the permissions and capabilities of a developer but not the discernment to tell an injection apart from a real error report. The data theft risk includes AWS keys, GitHub tokens, Sentry auth tokens, git credentials, private repository URLs, and developer identities, all exfiltrated without leaving a trace behind. There is no real impact to the target infrastructure for defensive tools to detect.

“Agentjacking” attacks are likely to become a new threat category unto themselves, as threat actors find similarly creative ways to string unprepared tools and trusted sources together. These differ from the existing and more direct prompt injection attacks in that there is less room for developers to adjust the guardrails of AI coding agents to head them off, something that will likely make this category a particular point of interest for attackers going forward. The main immediate point of focus will very likely be other approaches that abuse Model Context Protocol (MCP) integrations in a similar way; organizations employing AI coding agents may wish to review any of these integrations that can pass data back to the agent in an externally influenced way.