Deconstructing “Agents of Chaos”: Failures Behind Autonomous Agent Attacks

Highflame Technology Series

Yash Datta

AI Engineering

April 10, 2026

The "Agents of Chaos" Paper: https://agentsofchaos.baulab.info/report.html ‍(Northeastern, MIT, Harvard, CMU) red-teamed autonomous agents built on OpenClaw — an open-source AI assistant framework with persistent memory, shell access, and Discord messaging.

The results were sobering. Across 16 documented incidents, agents:

Obeyed anyone who messaged them — a non-owner asked for confidential emails and got them (CS2, CS3)
Got identity-spoofed — an attacker used the same Discord user ID with a different display name, claimed to be the owner, and convinced the agent to overwrite its core config files (CS8)
Entered infinite loops — two agents were tricked into messaging each other endlessly, burning resources (CS4)
Had their policies injected — an attacker wrote a "constitution" into an agent's memory, causing it to kick server members (CS10)
Broadcast libel — an agent was instructed to send a false, inflammatory accusation about a named individual to its 14 contacts across the agent network without verifying the claim (CS11)

The root cause? No cryptographic identity. No delegated authority. No scope boundaries.

OpenClaw agents treated every Discord message as equally authoritative. There was no way to limit what a delegated agent could do, no way to scope tool access, and no way to revoke access when things went wrong.

The key insight

Everyone — the owner and the attacker — interacts with the agent through the same channel. From the agent’s perspective, both inputs are just messages. There’s no distinction between who is allowed to instruct it and what authority those instructions carry.

A Discord message is treated as equally valid regardless of who sent it. Once the agent decides to act, the situation gets worse.

It has unrestricted access to tools — shell, email, memory — all behind the same identity boundary. There is no concept of scoped capability. If the agent can call a tool, it can call it fully.

And when it spawns a sub-agent, that sub-agent inherits the same access. Authority propagates forward without any attenuation. At that point, the system has lost every meaningful control surface:

No identity boundary → the agent cannot distinguish owner from attacker
No scope boundary → every action has full access to every tool
No delegation boundary → sub-agents inherit unrestricted authority
No revocation boundary → once running, nothing forces the system to stop

This is why the attacks in Agents of Chaos are so sobering. They don’t rely on sophisticated exploits. They rely on the fact that the system has no meaningful way to say no.

An attacker doesn’t need to break the model. They only need to speak to it.

And once the agent decides to act, there is nothing in the architecture that can contain the blast radius.

The paper's fundamental finding is that agentic-layer vulnerabilities are distinct from model-level weaknesses. You can have a perfectly aligned LLM, but if the scaffolding around it gives every agent unrestricted access to every tool, a single tricked agent can cause unlimited damage.

You cannot prevent compromise of an LLM-driven agent — prompt injection alone guarantees that.

Even if an agent is tricked, the blast radius should be bounded. That's the entire game.

Taming the Chaos

We applied the Highflame platform towards solving each of the findings and here is what we found. We started by ensuring that every tool call required a scoped identity, every sub-agent gets attenuated permissions, and every delegation chain be revoked instantly.

Three principles were applied to make this work:

Every agent gets a stable, verifiable identity — a WIMSE URI, not a display name
Authority is delegated, not assumed — scope intersection at every hop, never escalation
Revocation is instant and cascading — revoke the root, and every agent in the chain goes dark

At the same time, not every class of attack is solved at the identity layer.

Content safety (CS7 — harmful generation), provider value alignment (CS6), prompt injection (CS12), and libel propagation (CS11) require runtime guardrails — a content inspection layer, not an identity layer.

So, for complete protection against Agents of Chaos-style attacks, you’d need:

Agent Identity (Highflame ZeroID)
Runtime guardrails (Highflame Shield) for content safety and prompt injection
Action sandboxing (Highflame Agent Control) limiting what tools an agent can invoke based on its token scopes
Behavioral monitoring (Highflame DeepContext Intent Drift Detection) detecting loops, resource abuse, anomalous patterns
CAE signals (Highflame ZeroID) help here working through a detection layer feeding them using Highflame’s Cedar-based agent control platform to provide authorization and policy controls

Agent Identity + Agent Authority = Agent Control

ZeroID is an open-source identity layer for autonomous AI agents, built on OAuth 2.1, WIMSE/SPIFFE, and RFC 8693 token exchange. Here is how that maps directly to the attack vectors documented in the paper:

Attacks ZeroID directly prevents
Attacks ZeroID partially mitigates

Highflame ZeroID is an open-source identity layer for autonomous AI agents, built on OAuth 2.1, WIMSE/SPIFFE, and RFC 8693 token exchange.

Here is how that maps directly to the attack vectors documented in the paper:

Attacks Highflame Identity (ZeroID) directly prevents

Attack	Paper Finding	How Highflame Identity (ZeroID) Prevents It
Sensitive data disclosure (CS3)	Agent dumps emails to a stranger	Agent's credential policy restricts scopes — even if tricked, its JWT doesn't carry email:read. Tools verify the token and reject out-of-scope requests
Denial of service (CS5)	Non-owner instructs agent to rm -rf /	Destructive scopes restricted to first_party trust level with hardware attestation. Agent's credential policy simply doesn't include system:admin
Agent corruption (CS10)	One agent modifies another's config	Each agent has its own WIMSE identity with its own scopes. Cross-agent access requires explicit RFC 8693 delegation with scope attenuation
Identity spoofing (CS8)	Attacker changes Discord display name to match the owner and convinces the agent to rewrite its config	Agent identity is a cryptographic SPIFFE URI bound to a private key, not a Discord display name. Even if the chat-layer impersonation succeeds, the agent cannot mint or accept a token claiming to be the owner — the cryptographic chain of custody breaks immediately

‍

Attacks Highflame Authorization prevents

Attack	Highflame Identity (ZeroID)	Highflame Authorization
Non-owner compliance (CS2)	Scoped tokens bound the blast radius even when the agent obeys a stranger — the obeyed action can only do what the agent's own scopes permit	Application-level sender verification (e.g. a Discord-OAuth bridge)
Resource loops (CS4)	Token TTL (max 1hr default) naturally breaks loops. CAE signals trigger automatic revocation on anomalous behavior	Runtime loop detection monitor
Agent collaboration risk (CS9)	Delegation depth limits (default: 1) prevent unbounded agent chains. Scope attenuation blocks privilege escalation	Content-level filtering for what agents share
Disproportionate response (CS1)	Owner-scoped authorization before destructive self-modifications	LLM reasoning / judgment improvements

Blending Authorization Controls with Identity

Beyond the individual components, what Highflame introduces is something more fundamental: a control plane for execution, not just access. Most identity and authorization systems are designed to evaluate a single request in isolation. They answer whether a specific action should be allowed at a specific moment. That works for APIs. It breaks for agents. Agents don’t make one request. They initiate an execution that unfolds over time — across tools, across systems, and often across other agents. The failure mode isn’t just unauthorized access. It’s unbounded execution.

Highflame Agent Control Platform shifts the enforcement point from the request to the execution itself with stable identity.

‍
Every action an agent takes is tied back to:

a stable identity (who is acting)
a scoped delegation chain (who authorized it)
and a bounded execution context (why it is acting and under what constraints)

This is what enables properties that don’t exist in traditional systems:

Execution-scoped enforcement — decisions are made in the context of the entire delegation chain, not just a single token
Deterministic containment — even if an agent is tricked, its actions cannot exceed its scoped authority
Chain-wide revocation — authority is not revoked at a single token boundary, but across the entire execution graph

In other words, Highflame+ZeroID doesn’t just make identity stronger. It makes agent execution governable.

SPIFFE alone gives you identity. OAuth alone gives you scoped tokens. Neither was designed for a system where an agent can spawn another agent, which can in turn spawn others — while still requiring the entire chain to be revocable in real time. ZeroID’s contribution is the combination: stable per-agent SPIFFE identities, RFC 8693-based scope attenuation across delegation chains, and cascade revocation propagated via CAE signals.

Revoke the root credential, and every descendant agent goes dark before its next tool call. That property — chain-wide, near-instant containment of execution — doesn’t exist in today’s identity or authorization systems. And it’s not something you can retrofit easily. Retrofitting identity into an existing agent fleet is significantly harder than building on it from day one. Every agent shipped without scoped credentials becomes technical debt the moment something goes wrong.

The Agents of Chaos paper shows exactly what “something goes wrong” looks like in practice. And with NIST’s AI Agent Standards Initiative now treating identity, authorization, and execution control as priority areas, this is no longer theoretical.

The threat model is here. The standards are coming. The infrastructure needs to come first.‍

Check out Highflame's Agent Control Platform
‍Try out Highflame ZeroID: Open source Agent Identity

If you are building or deploying Agents, we would love to chat!

Want to try it out or sign up for a free trial?

Book A Demo

HighFlame Technology Series

Continue Reading

Mission Drift: Why AI Agents Fail at Step 100

AI agents don't fail with a bang; they erode. Learn why "Step 1" metrics can't stop Mission Drift and how Highflame uses Mission-Anchored runtime enforcement to keep autonomous agents on track through the hundredth step.

The Uniformed Guard Problem: Why AI Agent Sandboxes Need Identity, Not Just Policy

NemoClaw is NVIDIA’s reference stack for running OpenClaw agents safely. It wraps the agent in an OpenShell sandbox with a deny-by-default network policy: no outbound connections unless they’re explicitly listed. Learn why identity, not just policy, is critical to securing autonomous AI systems and preventing misuse.

Your agent followed every rule. It still broke policy.

Most LLM agent failures don’t look like failures. This post breaks down a new class of failures because critical context is missing at decision time.

Mission Drift: Why AI Agents Fail at Step 100

The Uniformed Guard Problem: Why AI Agent Sandboxes Need Identity, Not Just Policy

Mission Drift: Why AI Agents Fail at Step 100

The Uniformed Guard Problem: Why AI Agent Sandboxes Need Identity, Not Just Policy

Launching Palisade: Zero-Trust Security for the AI Model Supply Chain

Introducing Overwatch: Code Agent Security

Deconstructing “Agents of Chaos”: Failures Behind Autonomous Agent Attacks

Yash Datta

The key insight

Taming the Chaos

Agent Identity + Agent Authority = Agent Control

Blending Authorization Controls with Identity

Continue Reading

Mission Drift: Why AI Agents Fail at Step 100

The Uniformed Guard Problem: Why AI Agent Sandboxes Need Identity, Not Just Policy

Your agent followed every rule. It still broke policy.

Stay connected
with insights and updates

HighFlame

Platform

Use Cases

Company

Launching Palisade: Zero-Trust Security for the AI Model Supply Chain

Introducing Overwatch: Code Agent Security

Deconstructing “Agents of Chaos”: Failures Behind Autonomous Agent Attacks

Yash Datta

The key insight

Taming the Chaos

Agent Identity + Agent Authority = Agent Control

Blending Authorization Controls with Identity

Continue Reading

Mission Drift: Why AI Agents Fail at Step 100

The Uniformed Guard Problem: Why AI Agent Sandboxes Need Identity, Not Just Policy

Your agent followed every rule. It still broke policy.

Stay connectedwith insights and updates

HighFlame

Platform

Use Cases

Company

Stay connected
with insights and updates