Unlocked digital lock showing attack chain targeting AI vulnerability

Claude “Claudy Day” Attack Chain Leverages Vulnerability in Search Results

A new vulnerability chain discovered by Oasis Security can compromise the Claude AI chatbot and does not require the target to have the app installed or even have an account with the service. The attack chain instead begins with a malicious webpage doctored up to place highly in search results for Claude, which passes the user to a pre-filled chat URL that exploits other vulnerabilities in the AI agent. The end result is that sensitive data can be stolen.

The Claude attack chain has three elements in total: an invisible prompt injection in the pre-filled URL the victim engages with from search results, a data exfiltration channel that exploits a hole in the Claude API, and an open redirect from Claude.com that makes the malicious search result look like it’s coming from some sort of legitimate source. Each individual vulnerability creates its own subset of serious threats; Anthropic says that it has addressed the prompt injection element prior to the paper’s publication, but the other two issues are still being worked on.

Sophisticated Claude attack chain ensnares victims with a real-looking search result

While the first step in compromise of a victim is their click on an attack URL, the key element in the attack chain that makes all of this possible is a seeming open redirect oversight on claude.com. The search result will look for all the world like the user is visiting the legitimate Claude website based on the preview when this is manipulated. They do in fact end up following a claude.ai link, but one that contains hidden exfiltration instructions in the URL that are not visible when clicking through from the search results. The researchers note that this preview is the most well-disguised when the attacker pays for either Google Search ads or a Gmail Inbox ad.

With this ability available to them, the attacker can then forge a pre-filled claude.ai prompt link to disguise. The malicious elements are in the URL but can be hidden from view via certain tags, such as the paragraph tag. However, though they are invisible to the naked eye they will be read and followed by the AI agent.

So how does this turn into exfiltration of data? The malicious prompt targets prior conversations with Claude that it remembers the details of. These prompts can be worded in a number of ways but essentially boil down to asking Claude to search its conversation history for exchanges that include sensitive information, and to then summarize and repeat that sensitive information.

Claude does operate in a “sandbox” that restricts passing of this sort of information to outside sources to a great degree. However, the attack chain does not require any extra tools or external influences to bypass this. While an exfiltration request cannot directly be embedded in the malicious URL, the URL can instruct Claude to send the requested information as a prompt to the Anthropic Messages API. This enables the attacker to view it from their API logs. The attacker needs only create a free-tier Anthropic API account, generate an API key, and embed the API key in the prompt injection payload to exploit this vulnerability.

There is also a specific technique to making the exfiltration prompt do its part in the attack chain, at least prior to Anthropic addressing it with patching. It requires framing the request as a learning exercise about the Anthropic Files API, lying to it about the attacker’s API key being a dummy that can be expected not to work, and use the requested previous conversations as “sample content” for this supposed teaching example.

Monzy Merza, Co-Founder and Chief Executive Officer at Crogl, sees this as yet another call for organizations to treat AI agents no differently than any human user when it comes to security: “In too many cases, the model still comes down to “trust the prompt”, which isn’t a safeguard, it’s an exposure. As OWASP has warned, risks like prompt injection and agent hijacking are no longer theoretical, they’re operational.”

“The takeaway is simple: AI agents must be governed like any other privileged identity. That means strict intent validation, deterministic controls, least-privilege access, and full auditability of every action. Organizations need to know where their agents run, what they can access, and whether decisions can be traced end-to-end. If not, there’s a clear governance gap. The industry doesn’t need better prompts, it needs a new security architecture for autonomous systems,” recommended Merza.

No word of vulnerability exploitation in the wild

An additional possibility for attackers is exploiting any MCP servers, tools, or integrations the victim has enabled. The malicious prompt can trigger actions such as reading files, sending messages and interacting with other connected services. There is no limitation on data accessed this way being exfiltrated via the API vulnerability as well.

The researchers developed the attack chain independently and do not note any observations of it being used in the wild. A real attacker would likely deploy this against a specific target region, professional industry or income level in tandem with Google ad purchases. If the intended victim’s email address or phone number is known it is possible to more specifically target them.

After following one of these malicious links, the user will be able to view the injected content in the chat history as well as the agent’s response to it. However, at that point the attack has likely already been executed and any target information has been pilfered.