1
1
1 year of experience
.png&w=828&q=75)
Deriv Sentinel is an AI-powered Web Application Firewall that protects LLM agents from prompt injection and data leakage through a continuous red-team-and-heal cycle. The Problem: Traditional WAFs can't protect AI agents. Prompt injection is the SQL injection of the AI era - natural language attacks bypass conventional input validation, and patching one technique just leads attackers to find new ones. Our Solution: Instead of waiting for attacks, Deriv Sentinel attacks itself first, then autonomously patches the vulnerabilities it discovers. How It Works: 1. Attack — An attacker model generates realistic social engineering prompts enriched with Shadow RAG context (fake internal documents as honeypots). 2. Defend — Bastion (llama3.1:8b), our protected LLM loaded with simulated internal data, responds to each attack. 3. Audit — ShieldGemma (shieldgemma:2b) audits every response for data leakage and policy violations, backed by deterministic pattern matching as a second detection layer. 4. Heal — When a breach is detected, the Heal Engine injects a vaccine guardrail and redacts the exploited knowledge section. The same attack now gets blocked — without retraining. 5. Human-in-the-Loop — Analysts can approve/reject heals or enable auto-heal for autonomous defense. Key Innovations: - Knowledge Base Redaction — We remove leaked data from context entirely. LLMs can't leak what they don't have. - Multi-Layered Defense — AI auditor + deterministic matching + post-processing enforcement. - Instant, Reversible Fixes — Runtime prompt patches. No fine-tuning, no redeployment. - Adaptive — Each breach teaches the system a new defense. Demo: Reset → Run red-team → Bastion leaks secrets → ShieldGemma detects → Heal applied → Same attack blocked. Self-healing proven in five minutes.
7 Feb 2026