
AgentSurface is a tool for testing the security of real AI agents that are exposed through HTTP JSON APIs. Instead of relying on mock agents or generic jailbreak examples, AgentSurface connects to an actual agent endpoint, injects adversarial prompts into a configurable JSON request field, sends real HTTP requests, and records the full evidence trail: masked request, raw response, extracted answer, finding type, risk score, and recommendations. The project focuses on practical risks in AI-powered products: prompt injection, system prompt or secret disclosure, private data exposure, unsafe tool/action compliance, BOLA/IDOR-style cross-user access, and authorization gaps in support, finance, trading, CRM, and marketplace agents. AgentSurface includes a Streamlit UI with three main workspaces: Attack Sets for creating reusable adversarial prompt sets, Run for configuring the real API target and launching scans, and History for reviewing previous runs, findings, raw evidence, JSON exports, and policy drafts. It can also generate a Lobster Trap YAML policy draft, helping teams turn some detected risks into proxy-layer mitigations when applicable. The main idea behind AgentSurface is to treat an AI agent as an attack surface, not just as a chatbot. It helps teams test whether their agent follows security and business rules under adversarial input, while keeping concrete evidence that developers can use to debug and fix the issue.
19 May 2026

ChaosMonkey MCP is a local-first MCP server for autonomous fault injection and crash discovery. It gives AI agents such as IBM Bob, Claude, Cursor, and Codex provider-agnostic chaos engineering tools that can stress-test local projects, inject failures, execute commands, and identify resilience issues automatically. The project focuses on autonomous crash discovery rather than remediation. ChaosMonkey MCP does not modify repositories or generate fixes. Instead, it helps AI agents uncover hidden failure modes such as null-handling bugs, malformed payload assumptions, weak validation, and fragile error handling. The platform currently includes: * fault injection through nested null mutation, * repeated chaos execution with deterministic seeds, * structured crash and traceback reporting, * safe local command execution, * secure local file reading, * a production-ready local FastMCP server. ChaosMonkey MCP runs entirely locally over stdio using the Model Context Protocol, without requiring any cloud backend or hosted infrastructure. The project was developed using IBM Bob IDE during the hackathon, including the MCP architecture, tool design, fault injection workflows, and crash discovery pipeline. It demonstrates how IBM Bob can be used to design AI-native developer tooling and autonomous resilience testing workflows.
17 May 2026