Credit: Anthropic Blog
This is a radically different model than a human using code generation capabilities to “vibe hack” or consulting an AI for advice during an otherwise human-driven attack campaign. The framework monitored the state of multiple parallel attack sequences, transitioned phases with minimal human intervention, and aggregated results across multiple sessions. According to Anthropic, “Peak activity included thousands of requests, representing sustained request rates of multiple operations per second.” It is worth noting that the framework leveraged standard open source tools for network scanning, penetration testing, code analysis, etc. This demonstrates that the speed and scale of AI can make attack frameworks very effective without reliance on novel exploits. It also suggests that similar attacks may proliferate.
Why This Attack Changes Everything
This attack makes real what security professionals have been warning about since the advent of enterprise agents. Technical innovation in the past few years has only improved AI Agents' ability to conduct various tasks, follow complex instructions, maintain state over multi-step processes, and make decisions to achieve an end goal. Meanwhile, standards like MCP have arisen to standardize how models use tools. Now, MCP servers exist to help agents interact with the outside world to gather information and execute tasks. Agents can automate browser activity, retrieve data, execute remote commands, and manipulate various systems. The same technical innovations that have given Agents greater utility have made them effective for cyber attacks.
Simple techniques jailbroke Claude Code. Simple tools used at machine speed and scale made it an effective weapon. Threat actors with limited resources will now be able to conduct attack campaigns that once required nation-state-level coordination.
This attack represents an inflection point.
Cyber leaders must now devise strategies to protect their valuable assets from AI Agent attacks and also take measures to secure their own Agents from being weaponized. 3 things that will no longer work in the era of Agent Attacks:
- Reliance on internal model safeguards as a primary means of securing in-house AI systems themselves without controls external to the model.
- Reliance on layers of defense that stop at creating a perimeter around data systems without file-level intelligence of the data inside the system.
- Reliance on reactive security models that don’t proactively reduce attack surface and bolster security posture.
What Could Securiti Have Done to Reduce Risk in a Similar Scenario?
Our DataAI Security capabilities could mitigate or outright prevent many aspects of an AI Agent attack like this.
In a scenario where an attacker is using AI Agents to compromise your data, our platform would help to:
- Gain visibility into vulnerable shadow assets.
- Identify overpermissioned human and machine identities that multiply the reach of agentic attacks.
- Identify toxic combinations of factors that cause risk when found together, like sensitive data being accessible by unintended identities or system misconfigurations allowing unauthenticated access to data.
- Reduce potential attack surface through proactive least privilege access enforcement and removal of unnecessary, obsolete, or redundant data.
- Integrate with various cybersecurity tools to enrich security operations with granular data intelligence.
If an AI Agent were to get in through outer layers of defense, it would find itself in a Least Privilege environment with far fewer opportunities to move laterally, escalate privileges, or access sensitive data.
Our AI runtime guardrails can help protect AI systems themselves from exploitation. There is no reason to believe that Claude is any less safe than any other frontier model. That said, there are additional controls that are necessary to protect your AI systems at runtime. In a scenario where an attacker attempted to jailbreak a model you were using and use the MCP tools it had access to in a malicious manner, our platform could help to:
- Monitor and enforce policy against all relevant AI events, including prompts, outputs, and retrieval
- Scan for and block jailbreak attempts
- Quickly alert security analysts in the event of a jailbreak attempt
- Prevent misuse of AI by restricting usage to approved topics aligned with the intended usage of the system
Securiti deploys AI runtime guardrails to protect the use of AI models by scanning all interactions between users, models, and tools. An attacker would find it much more difficult to jailbreak a model protected by Securiti, where prompts are scanned specifically for jailbreak attempts and any activity deemed off-topic or not in policy for that particular AI system before that prompt ever reaches the model. Even if an attacker was able to somehow still jailbreak the model itself, it would run into another hurdle of trying to use MCP tools in a malicious way.
The Challenge of Protecting AI Systems