High Flame Technology Series

Launching Palisade: Zero-Trust Security for the AI Model Supply Chain

Yash Datta
AI Engineering
December 18, 2025

The AI ecosystem has a security blind spot.

We lock down software delivery with SAST, dependency scanning, signed artifacts, and hardened CI/CD. Then we download multi-GB model files from the internet, deploy them to production, and let them call internal tools—often with far less scrutiny than a container image.

The “models are just data” assumption no longer holds.

Today, we are releasing Palisade, an enterprise-grade ML model security scanner that applies zero trust to model artifacts. Palisade detects malicious payloads, backdoors, and supply-chain tampering before a model reaches an inference server. The core is written in Rust, so it can handle modern model sizes without consuming excessive memory or increasing CI latency.

Scanning with Palisade: Validators, Not Vibes

The Palisade model scan isn’t a single “malware check.” It’s a pipeline of validators, each answering a specific question about the artifact. The goal is to convert “random blob from the internet” into a structured security decision you can gate on in CI/CD.

At a high level, Palisade runs validators in three layers:

  1. Artifact and format integrity: Is this file what it claims to be?
  2. Static security checks: Does it contain known-dangerous patterns?
  3. Behavioral validation: Does it behave suspiciously under controlled probing?

Each layer is independently valuable; together they provide defense-in-depth for model ingestion.

Layer 1: Format + Structural Validators

A surprising amount of model risk starts with “this isn’t actually the format you think it is” or “this file is crafted to break tooling.” Structural validation is the fastest way to reject garbage early. These validators treat the model file like a “signed binary format,” not just bytes. What they validate:

  • Magic bytes / headers: quick detection of spoofed extensions (e.g., a pickle renamed to .safetensors)
  • Schema + structure: tensors, offsets, metadata blocks are well-formed and consistent
  • Bounds + corruption checks: prevents “crash the scanner / crash the loader” tricks (invalid offsets, truncated tensor blocks, malformed metadata)
  • Deterministic hashing: stable fingerprints for gating (“this exact artifact is approved”)

Layer 2: Static Security Validators

Static validators catch the stuff you’d never allow in a container image: executable deserialization, hidden attachments, and tampering indicators. It’s the “SAST for model artifacts” layer. These validators look for high-signal security issues without executing anything.
Examples:

  • Pickle/RCE detection
    • Flags artifacts containing pickle payloads or unsafe deserialization paths
    • Detects telltale opcodes/reducer patterns commonly used to trigger execution
  • Tokenizer integrity checks
    • Tokenizer files are an overlooked attack surface: token remaps can quietly steer outputs, jailbreak filters, or trigger “hidden” behaviors.
    • Palisade validates tokenizer consistency and identifies suspicious diff/tampering patterns.
  • Config + runtime posture checks
    • Flags suspicious config manipulation that changes model behavior at load time (e.g., unexpected remote refs, injected adapters, weird architectural mismatches)
  • Embedded payload checks
    • Detects “non-model” blobs bundled in/alongside artifacts (unexpected scripts, binaries, strange archive structures)

Layer 3: Dependency & Packaging Validators (Artifact Boundary)

Many real-world compromises live around the model: sidecar files, adapters, loaders, and packaging conventions. This layer validates the full model package, not just the weights file.
Examples:

  • Sidecar file allowlists/denylists: Enforces what’s permitted alongside the model (configs, tokenizers, adapters, license files)
  • Adapter/LoRA provenance + compatibility checks: Ensures adapters match expected base model + hashes and aren’t silently swapped
  • Reference hygiene: Flags surprising external references (remote paths, dynamic loading patterns) that expand the trust boundary

Layer 4: Behavioral Validators (Inference-Aware)

Some backdoors won’t show up in bytes. They live in the weights. A file can be perfectly “valid” and still be hostile. Behavioral validators are how you catch models that were trained to look clean until the correct input appears. These validators run controlled probes to detect signs of covert fine-tuning or trigger-based behavior.
Examples:

  • Perplexity Gap Analysis
    • Measures whether the model behaves “normally” across neutral prompts but shows sharp, localized shifts under specific prompt structures
    • Useful for surfacing trigger-based conditioning without knowing the trigger phrase ahead of time
  • Functional Trap Testing
    • Uses prompts designed to tempt bad behavior (tool misuse, instruction pivoting, exfil-like patterns)
    • Compares responses against expected safe baselines and flags anomalies (e.g., tool call attempts when tools shouldn’t be invoked).

Trust Starts Before Scanning: Model Signing

Scanning alone is not enough. You also need to know where a model came from and how it was produced. This is where model signing and provenance matter.

  • Sigstore enables cryptographic signing of model artifacts, creating an auditable, tamper-evident record.
  • SLSA (Supply-chain Levels for Software Artifacts) provides build provenance—who built the model, from what inputs, and under what process.

Without this context, even a “clean” scan result has limited value.

Alignment with CoSAI

Palisade is designed to align with guidance from the Coalition for Secure AI (CoSAI), an OASIS open project defining secure-by-design practices for AI systems. In practice, this means Palisade:

  • Generates ML-BOMs (Machine Learning Bills of Materials).
  • Validates model integrity and provenance.
  • Maps findings to standardized threat levels.

We are looking past simple alerts to give you actual control over which models are allowed in production.

Palisade: Built for Real-World Model Sizes

Palisade is a purpose-built system designed from the ground up for the realities of modern GenAI/LLM Models.

1. Lightning fast Rust Core for Scale and Predictability

Scanning a 70B-parameter model places significant stress on memory and I/O. Many Python-based tools fail with OOM errors or become impractically slow. Palisade uses a native Rust core with streaming validation and memory-mapped I/O:

  • Scans models larger than available RAM.
  • Processes files at 100+ MB/s.
  • Completes 7B-scale scans in seconds.

The performance characteristics are predictable, which matters in CI pipelines and production gating.

2. Multi-Layered Analysis

Palisade applies multiple layers of validation rather than relying on a single heuristic:

  • Static Checks: Detects pickle-based RCE, tokenizer tampering, and configuration manipulation.
  • Format Validation: Verifies SafeTensors and GGUF structure and integrity.
  • Inference-Based Detection: Uses techniques such as Perplexity Gap Analysis and Functional Trap Testing to identify covert fine-tuning. This is effective against backdoors that only surface under specific prompts or tool-use scenarios.

This combination allows Palisade to detect issues that do not show up in file metadata alone.

3. Policy-Driven Enforcement

Security requirements vary by environment. Palisade treats policy as code, using Cedar files to define enforceable rules. This allows you to write expressive, audit-friendly policies that dictate exactly what is allowed—from blocking specific license types to mandating cryptographic signatures for production models.

Apply stricter rules for production

> palisade scan model.gguf --policy strict_production

Results can be emitted in plain text, JSON or even SARIF 2.1.0, making them directly consumable by GitHub Code Scanning, VS Code, and centralized security platforms.

How Palisade Differs from First Gen Model Scanners

Capability Generic Scanners Palisade
Model Awareness Treats files as opaque blobs Understands tensors, weights, and architectures
Backdoor Detection None Detects BadAgent, DoubleAgents, and fine-tuning attacks
Performance Slow or OOM on large files that take hours Rust-based streaming for 70B+ models, scans in minutes
Supply Chain Simple hash checks Sigstore + SLSA + ML-BOMs
Ecosystem Proprietary CoSAI-aligned, SARIF native

Getting Started

Palisade is designed to integrate cleanly into existing ML and security workflows—from local experimentation to CI/CD enforcement and production gating. You can start with a single command and gradually layer in stricter controls as your environment matures.

Scan a model

Run a security scan against a model artifact before loading it into memory or deploying it to an inference service:

palisade scan /path/to/model.safetensors

During a scan, Palisade analyzes the model at multiple layers, including:

  • Artifact safety checks (unsafe serialization, malformed structures)
  • Format validation (SafeTensors, GGUF, and related formats)
  • Tampering indicators and configuration manipulation
  • Behavioral risk signals that may indicate covert fine-tuning or backdoors

Scan results include a clear summary of findings and severity, allowing you to quickly determine whether a model is safe to proceed.

# Machine-readable output for CI/CD pipelines
palisade scan /path/to/model.safetensors --output json

# SARIF output for GitHub Code Scanning, VS Code, or SIEMs
palisade scan /path/to/model.safetensors --output sarif --out results.sarif

# Apply stricter rules for production environments
palisade scan /path/to/model.safetensors --policy strict_production

Verify Provenance

Before trusting a model, it’s critical to know who produced it and whether it has been modified. Palisade verifies cryptographic signatures and provenance metadata using Sigstore.

palisade verify-sigstore /path/to/model --public-key publisher.pub

This verification ensures that:

  • The model artifact has not been altered since signing
  • The signature matches the expected publisher identity
  • The artifact being scanned is exactly the one that was signed

Provenance verification allows you to enforce policies such as:

  • Only allowing models signed by approved publishers
  • Blocking unsigned or unknown artifacts in production
  • Auditing model origins for compliance and governance

Together, scanning and provenance verification help establish a verifiable chain of trust from model creation through deployment.

Bottom Line

The AI model supply chain is now part of your attack surface. Treating model artifacts as trusted inputs is no longer a safe default. Palisade helps you enforce trust before execution. It combines artifact-aware scanning, integrity checks, and provenance verification to establish a verifiable chain of trust—from training output and packaging, through distribution and storage, all the way to deployment gates and inference runtime. In practice, this means you can move from “we downloaded a model and hoped for the best” to auditable, policy-driven control over which models are allowed to run using the same rigor we already expect from modern software delivery.

Try Palisade today, or talk to us for more information

Book A Demo

This is some text inside of a div block.
This is some text inside of a div block.
This is some text inside of a div block.
HighFlame Technology Series

Continue Reading