Highflame Technology Series

Deconstructing “Agents of Chaos”: Failures Behind Autonomous Agent Attacks

Yash Datta
AI Engineering
April 10, 2026

The "Agents of Chaos" Paper: https://agentsofchaos.baulab.info/report.html (Northeastern, MIT, Harvard, CMU) red-teamed autonomous agents built on OpenClaw — an open-source AI assistant framework with persistent memory, shell access, and Discord messaging.

The results were sobering. Across 16 documented incidents, agents:

  • Obeyed anyone who messaged them — a non-owner asked for confidential emails and got them (CS2, CS3)
  • Got identity-spoofed — an attacker used the same Discord user ID with a different display name, claimed to be the owner, and convinced the agent to overwrite its core config files (CS8)
  • Entered infinite loops — two agents were tricked into messaging each other endlessly, burning resources (CS4)
  • Had their policies injected — an attacker wrote a "constitution" into an agent's memory, causing it to kick server members (CS10)
  • Broadcast libel — an agent was instructed to send a false, inflammatory accusation about a named individual to its 14 contacts across the agent network without verifying the claim (CS11)

The root cause? No cryptographic identity. No delegated authority. No scope boundaries.

OpenClaw agents treated every Discord message as equally authoritative. There was no way to limit what a delegated agent could do, no way to scope tool access, and no way to revoke access when things went wrong.

The key insight

Everyone — the owner and the attacker — interacts with the agent through the same channel. From the agent’s perspective, both inputs are just messages. There’s no distinction between who is allowed to instruct it and what authority those instructions carry.

A Discord message is treated as equally valid regardless of who sent it. Once the agent decides to act, the situation gets worse.

It has unrestricted access to tools — shell, email, memory — all behind the same identity boundary. There is no concept of scoped capability. If the agent can call a tool, it can call it fully.

And when it spawns a sub-agent, that sub-agent inherits the same access. Authority propagates forward without any attenuation. At that point, the system has lost every meaningful control surface:

  • No identity boundary → the agent cannot distinguish owner from attacker
  • No scope boundary → every action has full access to every tool
  • No delegation boundary → sub-agents inherit unrestricted authority
  • No revocation boundary → once running, nothing forces the system to stop

This is why the attacks in Agents of Chaos are so sobering. They don’t rely on sophisticated exploits. They rely on the fact that the system has no meaningful way to say no.

An attacker doesn’t need to break the model. They only need to speak to it.

And once the agent decides to act, there is nothing in the architecture that can contain the blast radius.

The paper's fundamental finding is that agentic-layer vulnerabilities are distinct from model-level weaknesses. You can have a perfectly aligned LLM, but if the scaffolding around it gives every agent unrestricted access to every tool, a single tricked agent can cause unlimited damage.

You cannot prevent compromise of an LLM-driven agent — prompt injection alone guarantees that.

Even if an agent is tricked, the blast radius should be bounded. That's the entire game.

Taming the Chaos

We applied the Highflame platform towards solving each of the findings and here is what we found. We started by ensuring that every tool call required a scoped identity, every sub-agent gets attenuated permissions, and every delegation chain be revoked instantly.

Three principles were applied to make this work:

  • Every agent gets a stable, verifiable identity — a WIMSE URI, not a display name
  • Authority is delegated, not assumed — scope intersection at every hop, never escalation
  • Revocation is instant and cascading — revoke the root, and every agent in the chain goes dark

At the same time, not every class of attack is solved at the identity layer.

Content safety (CS7 — harmful generation), provider value alignment (CS6), prompt injection (CS12), and libel propagation (CS11) require runtime guardrails — a content inspection layer, not an identity layer.

So, for complete protection against Agents of Chaos-style attacks, you’d need:

  • Agent Identity (Highflame ZeroID)
  • Runtime guardrails (Highflame Shield) for content safety and prompt injection
  • Action sandboxing (Highflame Agent Control) limiting what tools an agent can invoke based on its token scopes
  • Behavioral monitoring (Highflame DeepContext Intent Drift Detection) detecting loops, resource abuse, anomalous patterns
  • CAE signals (Highflame ZeroID) help here working through a detection layer feeding them using Highflame’s Cedar-based agent control platform to provide authorization and policy controls

Agent Identity + Agent Authority = Agent Control

ZeroID is an open-source identity layer for autonomous AI agents, built on OAuth 2.1, WIMSE/SPIFFE, and RFC 8693 token exchange. Here is how that maps directly to the attack vectors documented in the paper:

  • Attacks ZeroID directly prevents
  • Attacks ZeroID partially mitigates
Highflame ZeroID is an open-source identity layer for autonomous AI agents, built on OAuth 2.1, WIMSE/SPIFFE, and RFC 8693 token exchange.

Here is how that maps directly to the attack vectors documented in the paper:

Attacks Highflame Identity (ZeroID) directly prevents

Attack Paper Finding How Highflame Identity (ZeroID) Prevents It
Sensitive data disclosure (CS3) Agent dumps emails to a stranger Agent's credential policy restricts scopes — even if tricked, its JWT doesn't carry email:read. Tools verify the token and reject out-of-scope requests
Denial of service (CS5) Non-owner instructs agent to rm -rf / Destructive scopes restricted to first_party trust level with hardware attestation. Agent's credential policy simply doesn't include system:admin
Agent corruption (CS10) One agent modifies another's config Each agent has its own WIMSE identity with its own scopes. Cross-agent access requires explicit RFC 8693 delegation with scope attenuation
Identity spoofing (CS8) Attacker changes Discord display name to match the owner and convinces the agent to rewrite its config Agent identity is a cryptographic SPIFFE URI bound to a private key, not a Discord display name. Even if the chat-layer impersonation succeeds, the agent cannot mint or accept a token claiming to be the owner — the cryptographic chain of custody breaks immediately

Attacks Highflame Authorization prevents

Attack Highflame Identity (ZeroID) Highflame Authorization
Non-owner compliance (CS2) Scoped tokens bound the blast radius even when the agent obeys a stranger — the obeyed action can only do what the agent's own scopes permit Application-level sender verification (e.g. a Discord-OAuth bridge)
Resource loops (CS4) Token TTL (max 1hr default) naturally breaks loops. CAE signals trigger automatic revocation on anomalous behavior Runtime loop detection monitor
Agent collaboration risk (CS9) Delegation depth limits (default: 1) prevent unbounded agent chains. Scope attenuation blocks privilege escalation Content-level filtering for what agents share
Disproportionate response (CS1) Owner-scoped authorization before destructive self-modifications LLM reasoning / judgment improvements

Blending Authorization Controls with Identity

Beyond the individual components, what Highflame introduces is something more fundamental: a control plane for execution, not just access. Most identity and authorization systems are designed to evaluate a single request in isolation. They answer whether a specific action should be allowed at a specific moment. That works for APIs. It breaks for agents. Agents don’t make one request. They initiate an execution that unfolds over time — across tools, across systems, and often across other agents. The failure mode isn’t just unauthorized access. It’s unbounded execution.

Highflame Agent Control Platform shifts the enforcement point from the request to the execution itself with stable identity.


Every action an agent takes is tied back to:

  • a stable identity (who is acting)
  • a scoped delegation chain (who authorized it)
  • and a bounded execution context (why it is acting and under what constraints)

This is what enables properties that don’t exist in traditional systems:

  • Execution-scoped enforcement — decisions are made in the context of the entire delegation chain, not just a single token
  • Deterministic containment — even if an agent is tricked, its actions cannot exceed its scoped authority
  • Chain-wide revocation — authority is not revoked at a single token boundary, but across the entire execution graph
In other words, Highflame+ZeroID doesn’t just make identity stronger. It makes agent execution governable.

SPIFFE alone gives you identity. OAuth alone gives you scoped tokens. Neither was designed for a system where an agent can spawn another agent, which can in turn spawn others — while still requiring the entire chain to be revocable in real time. ZeroID’s contribution is the combination: stable per-agent SPIFFE identities, RFC 8693-based scope attenuation across delegation chains, and cascade revocation propagated via CAE signals.

Revoke the root credential, and every descendant agent goes dark before its next tool call. That property — chain-wide, near-instant containment of execution — doesn’t exist in today’s identity or authorization systems. And it’s not something you can retrofit easily. Retrofitting identity into an existing agent fleet is significantly harder than building on it from day one. Every agent shipped without scoped credentials becomes technical debt the moment something goes wrong.

The Agents of Chaos paper shows exactly what “something goes wrong” looks like in practice. And with NIST’s AI Agent Standards Initiative now treating identity, authorization, and execution control as priority areas, this is no longer theoretical.

The threat model is here. The standards are coming. The infrastructure needs to come first.

Check out Highflame's Agent Control Platform
Try out Highflame ZeroID: Open source Agent Identity


If you are building or deploying Agents, we would love to chat!

Want to try it out or sign up for a free trial?

Book A Demo

This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.
HighFlame Technology Series

Continue Reading