AgentTrap

Created by team Entropy Loop on May 18, 2026
Agent Security & AI Governance - Veea

Enterprise teams deploying AI agents have no systematic way to test whether their agents are vulnerable to the attack vectors now documented in peer-reviewed research. AgentTrap solves this with a two-component autonomous security system grounded in the Franklin et al. 2025 Google DeepMind AI Agent Traps taxonomy. The Target Agent is a LangChain ReAct agent with real tools (web search, file read, calculator) representing a typical enterprise deployment. It runs behind Lobster Trap, an open source DPI proxy that inspects every prompt and response against a custom YAML policy. The Red-Team Agent autonomously fires adversarial probes across all 6 attack categories: Content Injection, Semantic Manipulation, Cognitive State, Behavioural Control, Systemic, and Human-in-the-Loop. Every probe is evaluated by two defense layers. Lobster Trap handles regex-based policy enforcement. A fine-tuned MiniLM classifier (F1 0.970) provides ML-based detection. Every interaction is SHA256 hashed and written to an audit log with declared versus detected intent mismatch tracking. Key finding: the ML classifier flagged all 24 adversarial probes across all 6 categories. The policy layer blocked 3. The 21 allowed interactions reveal the coverage gap that regex-only defenses cannot close, and the audit report tells a security team exactly where to harden next.

Category tags: