AGENT SECURITY FABRIC

Hyper-efficient guardrail models built for high speed and low cost

Secure every Agentic interaction with our family of lightweight, high-performance models for industry-leading conversational context awareness. Deploy anywhere, from cloud to edge.

Built for scale

Highflame’s family of specialized, compact AI models is purpose-built for the unique security challenges of Agentic workflows. Designed from the ground up for efficiency and performance, Javelin models deliver robust protection without the latency and cost burdens of larger, general-purpose models.

Javelin DeepContext

Stateful, Contextual, Multi-Turn

Our next-generation models are designed to detect intent and perform continuous stateful security analysis. These models are designed for real-time enforcement and to maintain conversational context, enabling detection of sophisticated, multi-turn attacks and real-time conversation threat scoring.

Javelin Pulse

Stateless, Compact, Specialized

Our foundational suite of ultra-fast, low-parameter models is optimized for stateless, single-request analysis. Perfect for detecting threats like prompt injection and toxicity with minimal overhead.

Built to power agentic workflows

Javelin models are purpose-built for agentic systems, designed to operate in line with every decision, action, and tool call. From ultra-fast single-turn enforcement to deep, context-aware multi-turn analysis, they deliver real-time security without slowing agents. Lightweight and explicitly trained for AI threats, Javelin Models deliver precise, production-grade protection where modern AI actually runs.

Hyper-Fast Speed

Achieve guardrail decisions in under 100ms on GPU, crucial for real-time applications.

Edge-Ready

Their small footprint and low latency make them ideal for deployment directly on edge devices, enhancing privacy and reducing network dependency.

Precision Accuracy

Achieve state-of-the-art performance (>95% F1-Score) on critical security tasks like prompt injection defense.

Context-Aware Security

Our next-gen DeepContext models understand conversation history, enabling detection of sophisticated multi-turn attacks that traditional guardrails miss.

A Guardrail That Remembers.

Traditional guardrails treat every request in isolation. They have no memory. This makes them blind to sophisticated attacks that unfold over multiple turns, such as gradual jailbreaks, complex social engineering, or subtle topic drift that bypasses single-shot defenses.

Multi-turn threat detection based on intent signals

Javelin DeepContext represents a paradigm shift in AI security. Leveraging a novel architecture optimized for understanding sequences and maintaining memory, these models track the state and context of a conversation turn by turn. This inherent "memory" allows them to:

Detect Multi-Turn Attacks:

Identify threats that develop over multiple interactions.

Score Real-Time Conversations:

Analyze the safety, compliance, and relevance of ongoing conversations using historical context.

Understand Contextual Nuance:

Make more accurate security decisions by considering the full conversational flow

Javelin DeepContext is designed for scenarios where single-request analysis falls short, providing robust, stateful protection for chatbots, agents, and complex interactive AI systems.

Precision and speed for every request

The foundation of the Highflame model family lies in our Javelin Pulse models. As detailed in our research (arXiv:2506.07330), these models utilize a highly optimized architecture. These compact classifiers deliver exceptional results on tasks like:

Prompt Injection Detection

Identifying and blocking malicious inputs designed to hijack the LLM.

Toxicity Filtering

Ensuring model outputs adhere to safety and ethical guidelines.

Lightweight & Efficient

With only ~110 million parameters, Javelin Pulse models offer powerful protection at a fraction of the size of larger LLMs.

Benchmarks: Our Javelin Pulse models consistently outperform larger, more costly alternatives, achieving over 95% F1-score on our internal JavelinGuard benchmark while maintaining sub-100ms latency.

How Javelin Pulse compares:

Javelin Pulse delivers state-of-the-art accuracy comparable to larger models but with significantly lower latency and a dramatically smaller footprint. This makes it well-suited for real-time applications and resource-constrained environments, such as edge devices, where larger LLM-based solutions are often impractical.

Model Type
Avg. Latency
(CPU) *
Size
(Parameters)
Accuracy
(Prompt Injection F1)*
Suitability for Edge
Javelin Pulse
~47ms
~450 Million
~95.3%
Excellent
Standard DeBERTa-v3-base Tune
~80-1100ms
~184 Million
~93-95%
Excellent
LlamaGuard (7B LLM)
>500ms - 1000ms+
~7 Billion
High (but context-dependent)
Poor
Large Commercial LLM API (e.g., GPT-4)
>1000ms - 3000ms+
Very Large (Billions+)
High (but slow & costly)
Not Applicable

*\Latency estimates are illustrative, highly dependent on hardware (CPU type, RAM), batch size, and specific task. Javelin Pulse latency is based on internal testing on standard CPU instances. Other latencies are representative estimates.

*\*Accuracy can vary significantly based on the specific dataset and attack types used for evaluation. Javelin Pulse F1 is reported on the Injection-Guard dataset.

Deploy anywhere

The efficiency of the Javelin Model family means you can deploy cutting-edge AI security wherever you need it:

AI Gateways & Agents

Embed Javelin as a Guardrail API call to enforce policies inline—at the gateway or inside the agent loop.

On-Premise

Maintain data sovereignty, privacy, and full operational control.

Edge

Run real-time, on-device security with ultra-low latency and a small footprint.

Developer Environments

Get instant feedback and protection in CI/CD and local development.

Ready to secure your AI with unparalleled speed and efficiency?

Get a demo Read Paper: