
As adoption of large language models (LLMs) increases, the sophistication of attacks against them continues to evolve. What began as simple one-shot jailbreaks has transformed into complex, multi-turn conversations designed to gradually bypass safety guardrails.
Today we’re introducing DeepContext, the next-generation model in Highflame’s Multi-Turn Security Model family.
DeepContext is a next-generation contextual sequence model designed to detect multi-turn attacks without requiring full transcript replay on every request.
To build effective LLM defenses, we need to understand the two major threat classes that show up again and again in production systems: jailbreaking and prompt injections. Jailbreaking is an attempt to bypass a model’s safety guardrails so it produces content it was designed to refuse. Prompt injection is different in spirit and in mechanics: it aims to hijack the model’s instructions by overriding or corrupting system prompts or other hidden guidance, so the model follows an attacker’s intent instead of the developer’s.
Both are enabled by a fundamental weakness in how LLMs process text: they do not inherently know what is “trusted instruction” versus “untrusted content.” To a model, it’s all tokens—unless you build infrastructure that can enforce boundaries.
For organizations deploying LLMs, the consequences are real: reputational harm, policy violations, data exposure, and unsafe actions in downstream tools.
Early jailbreak attempts were relatively straightforward, occurring within a single conversation turn. They happened in one turn, and they were often easy to spot because the intent was explicit: “Ignore previous instructions,” “pretend you’re unrestricted,” “this is purely hypothetical,” and so on. Common techniques included:
While these single-turn attacks posed challenges, traditional guardrail models could often detect and block them by analyzing individual prompts in isolation. The adversarial signal was concentrated in one message.
As defenses improved, attackers adapted by exploiting the biggest weakness in most guardrail models: they’re stateless.
As defensive systems improved, attackers adapted. Modern jailbreak techniques have evolved to exploit a critical weakness in traditional guardrails: the inability to maintain context across multiple conversation turns.
Modern attacks are frequently designed to unfold over time. The attacker doesn’t lead with the harmful request. They probe the boundary, build trust, introduce a narrative, redefine terms, and then gradually steer the conversation toward a policy violation. The key advantage for the attacker is simple: each message can look benign on its own. The maliciousness emerges in the trajectory. If our guardrail is only looking at one turn at a time, it will miss the pattern because the pattern lives in the conversation.
Examples include:
These multi-turn attacks are particularly dangerous because each individual turn may appear benign on its own. It's only after stepping back to view the conversation holistically that the malicious pattern becomes apparent.
Once you accept that multi-turn attacks are the norm, the requirement becomes obvious: a guardrail model has to maintain a meaningful understanding of the entire conversation history. It needs to recognize slow-burn escalation, detect subtle instruction hijacking, and identify contextual manipulation that only makes sense in aggregate.
The naïve solution is to feed the entire conversation into a heavy model every time. But that doesn’t scale. It’s slow, expensive, and brittle especially when you care about predictable latency in production.
More importantly, you are wasting expensive context window real-estate for safety determination.
So the real challenge becomes:
how do you get long-horizon context awareness without reprocessing long-horizon context every time?
That’s the problem DeepContext was built to solve.
DeepContext represents a fundamental shift in how we approach LLM safety; moving from analyzing individual prompts to understanding entire conversational contexts. And to defeat multi-turn attacks, a guardrail model must maintain a rich understanding of the entire conversation history. It needs to not only be trained on the latest multi-turn prompt injection & jailbreak training sets but it also needs to:
To get real-time context awareness without sacrificing performance, we built a two-pillar architecture that separates responsibilities: one component reads the current input at high speed, and the other maintains a compact, evolving representation of the conversation over time. Together, they provide fast gating, long-horizon memory, and consistent enforcement without requiring the entire transcript to be reprocessed on every request.
1. The High-Speed Intent Processor (Current Turn Analysis)
The first pillar is designed for throughput and latency. For each incoming user message, we run a proprietary, highly optimized encoder that analyzes the text of the current turn and produces an Intent Signature - a compact embedding-like representation of the message’s semantic intent and risk posture.
The intent signature is like a semantic fingerprint of the current input. It’s small, fast to compute, and stable enough to support downstream decisions like:
Because this processor is optimized specifically for intent and security-relevant semantics, it can operate at real-time speeds and serve as a low-latency front line. Importantly, it doesn’t need to understand the entire conversation to be useful - it just needs to provide a high-quality read on what this message is trying to do right now.
2. The Temporal State Machine (Conversational Memory):
The second pillar is where continuous defense becomes possible. Attackers rarely start with the final malicious request. They warm up the model, establish rapport, test boundaries, and then slowly escalate. If you only evaluate one turn at a time, the system is vulnerable to “slow-burn” strategies where each message is innocuous, but the trajectory is not.
The Temporal State Machine solves that by ingesting each Intent Signature sequentially and maintaining an evolving Context Vector - a compact representation of the conversation’s security-relevant state over time. Instead of storing or reprocessing the full transcript, it stores the distilled meaning of the interaction as it unfolds.
This gives us three powerful properties:
a) Long-horizon detection without full-history replay
Because the Context Vector is updated incrementally, you don’t need to re-run heavy analysis over everything that came before. You get continuity essentially “for free,” which keeps latency predictable.
b) Relevance-aware memory through gating
Not every part of a conversation should influence a security decision. The Temporal State Machine uses gating mechanisms to decide what to retain, what to down-weight, and what to ignore. Benign context fades; suspicious patterns persist. This prevents the system from being either overly forgetful or overly sensitive.
c) Trajectory awareness
The system isn’t only scoring content; it’s modeling change. It can detect patterns like repeated boundary testing, gradual constraint shaping, escalating attempts to override instructions, or the classic pivot from safe topics into restricted territory.
Individually, these turns can look harmless. Collectively, they tell a story.
The result is a guardrail layer that can reason about intent over time - which is often the difference between blocking obvious attacks and catching the adversarial workflows that actually happen in practice.
In internal benchmarking, DeepContext has demonstrated high efficacy against state-of-the-art attack vectors where traditional stateless guardrails failed. It successfully identifies:
Crucially, because the Temporal State Machine maintains context efficiently, we achieve this security without the massive computational overhead of re-processing the entire conversation history for every new request.
We evaluate DeepContext against some of the most popular LLM safeguards as well as our own single-turn Pulse model on multiple benchmarks:
Through our evaluation we fiind that DeepContext achieves F1-score of 0.951 significantly outperforming industry standard guard models like Quen3Guard, Google ModelArmor, PromptGuard family, Bedrock Guardrails family and other open source classifiers.
We will be publishing the results in an upcoming paper.
To see DeepContext in action, we analyzed a sophisticated jailbreak attempt using the Automated-Multi-Turn-Jailbreaks framework. In this demo we use an “Actor Attack”, which tries to elicit harmful content by asking questions about a related historical figure.
In this scenario, the attacker doesn't lead with a threat; they ask information about a historical figure, Ted Kaczynski, who is infamous for employing homemade bombs. They then gently steer the conversation towards how he was able to build bombs. These prompts are designed to slip past traditional, single-turn guardrails that only look at one message at a time.
We compared DeepContext along side popular guardrails provided by AWS Bedrock Guardrails, Google Model Armor and Microsoft Prompt Shield each of which failed to detect or flag the threats.
The conversation begins with an innocuous question, but as the attacker steers the LLM towards producing harmful content, DeepContext tracks this trajectory and blocks it as soon as the attacker’s true intent becomes apparent.
LLM attacks won’t stand still and neither can defenses. The tactics we see today, from prompt injection to slow-burn, multi-turn manipulation, are already evolving into more adaptive, more patient, and more context-aware adversaries. In other words: the attack surface is learning.
DeepContext is our response to that reality. It’s built on the belief that protecting agentic systems requires more than point-in-time filtering. It requires continuous understanding of intent over time, with guardrails that can recognize trajectories and not just individual messages and hold a consistent security posture across long-running interactions. As LLMs become more powerful and more deeply embedded in enterprise agentic workflows, guardrails stop being a “feature” and start becoming infrastructure.
If your AI can take actions, access tools, or influence decisions, then security can’t be bolted on after deployment. It has to be part of the system’s nervous system: always on, always learning, and designed for real adversaries.
If you’re building customer-facing assistants, internal copilots, or multi-agent systems, the question isn’t whether you’ll face sophisticated attacks. It’s whether you’ll be able to explain what happened, understand why it happened, and prevent it from happening again.
Want to see what continuous, multi-turn defense looks like in practice? Request a demo