New Tenable Report Finds DeepSeek Can Be Jailbroken to Create Malware

A new research report from Tenable finds that DeepSeek R1’s guardrails against the creation of malware are “trivial” and easily jailbroken. With the right nudging, the researchers were able to prompt the popular AI assistant to create a hidden keylogger and a set of working ransomware samples.

The malware that the researchers were able to coax out of DeepSeek was rudimentary and required some manual code editing to make it functional. But the incident demonstrates that the guardrails preventing malicious behavior in generative AI systems remain thin, even after years of hackers and security experts poking at tools like ChatGPT and Gemini.

Simple promises trick DeepSeek into creating malware

Unlike the other major generative LLMs, DeepSeek offers the ability to run a local model. This provided the researchers with a unique opportunity to examine the model’s Chain-of-Thought (CoT) process of reasoning out the answer to a question.

That “thought process” reveals that DeepSeek R1 does have safeguards against creating malware; when asked plainly to create a keylogger, the model is immediately suspicious of the user’s intentions and settles on a polite but firm denial. However, the researchers immediately demonstrate that these guardrails are very thin. By being assured that the request is only for “educational purposes” and not real-world use, the LLM goes along with the request. The researchers say that there are other relatively simple prompts that will jailbreak it that they have opted not to publicly disclose.

Once it has been convinced to create a keylogger, DeepSeek internally deliberates on the best way of going about crafting it to include avoiding common detection methods used by anti-malware software. It eventually settles on using hooks to capture keystrokes, but notes to itself that this method can make the keylogger visible. Its ultimate concoction is a low-level keyboard function that it intends to hide from Task Manager, but cannot develop a means to on its own.

The researchers note that the code it produced had several fatal bugs that had to be manually corrected; DeepSeek was prompted to fix them on its own but could not. And though it did not reason out a fully functional method of hiding the keylogger on its own, the researchers say that it was just one fatal error away from managing to. All of that could provide a novice would-be threat actor with the framework that they need to get going.

Similarly, the AI was able to produce several ransomware samples that just needed some manual corrections by the researchers to work. They note that this would also be a big boost to a novice attacker with no prior malware coding experience.

DeepSeek guardrails lag well behind competitors

While DeepSeek has been hailed as a cost-effective alternative that performs at roughly the same level as other market-leading LLMs, the report indicates that its security is at a much more rudimentary stage. Simple prompt manipulation attacks of this type were scoured out of ChatGPT and contemporaries some time ago, and while those models remain vulnerable to jailbreaking methods it generally takes something much more sophisticated than a pinky swear to the AI to only use the malware output for research purposes.

This is far from the first indication that DeepSeek’s security lags behind its contemporaries. One major issue is out in the open and baked into the design: it stores user data in China, which means the Chinese government essentially has unfettered access to it. Early inquiries by EU regulators have indicated that the company has no intention of changing that. But it has also already suffered a data breach of over a million records this early in its commercial life, and it has also been criticized for using weak hard-coded encryption keys and transmitting unencrypted data over the internet. The app also appears to engage in device fingerprinting and extensive data collection when used online.

All of that said, the “instant malware generator” feared when ChatGPT and the early generative chatbots appeared has yet to materialize. There is a brisk criminal underground trade in customized AI programs meant for this purpose, and for up-to-date phrases and approaches that jailbreak the popular mainstream models, but at the moment their utility is limited to more modest tasks such as preparing messages and emails for phishing campaigns. But this research demonstrates that there is at least current-day utility in getting inexperienced attackers up to speed, creating the possibility in a surge of attack quantity (if not quality).

Lucas von Stockhausen, Executive Director, Sales Engineering at Black Duck, sees this process as inevitable and one that must be expected to continue in an “arms race” as AI continues to develop: “It is not surprising that we are seeing ML being used to create malware. It’s also not surprising to see people finding ways to circumvent guardrails. This is really what we see in security and especially in application security every day. There is the intended use of something and then somebody finds a way to use it in a different way. People will always try to use systems for their own specific needs – bad actors are no different. I am expecting we’ll see this kind of activity for a quite a while, until the guard rails become more sophisticated; this will be a constant back and forth until we see some maturity.”

Casey Ellis, Founder at Bugcrowd, adds: “The other thing to keep in mind is that this is a rapidly evolving space. Threat actors are experimenting with AI, and while the current outputs may be imperfect, it’s only a matter of time before these tools become more sophisticated. Security teams need to stay ahead of the curve by fostering collaboration between researchers, industry, and policymakers to address these challenges proactively.”

J Stephen Kowski, Field CTO at SlashNext, offers some pointers about exactly what can be done to keep ahead of the threat actors: “To combat AI-generated malware, security teams need to implement advanced behavioral analytics that can detect unusual patterns in code execution and network traffic. Real-time threat detection systems powered by AI can identify and block suspicious activities before they cause damage, even when the malware is sophisticated or previously unknown. Multi-factor authentication, strong password policies, and zero-trust architecture are essential defenses that significantly reduce the risk of AI-powered attacks succeeding, regardless of how convincing they appear. For complete protection, organizations should combine these technical measures with regular employee training on recognizing social engineering attempts and implement automated response systems that can quickly isolate compromised systems before malware spreads.”