
A new way to jailbreak AI appears on Reddit, X, or arXiv almost every day. By the time a quarterly red-team catches it, it has already worked on a production chatbot. ROGUE closes that gap , the red-team that never sleeps. ROGUE is an autonomous red-team agent. It continuously harvests new LLM attacks from 19 live open-web sources — Reddit/X jailbreak communities, arXiv, GitHub (the Pliny umbrella), HuggingFace, MITRE ATLAS, OWASP, and vendor safety blogs — then reproduces each against YOUR deployment: your system prompt, your declared tools, your target model, scored together. Not a bare model. Not a frozen test bank. Your actual setup, against today's attacks. It's the only project here using Bright Data MCP on BOTH sides. As a consumer, the discovery agent reasons over Bright Data's MCP tools (Web Scraper, SERP, Web Unlocker, Scraping Browser) to reach sources that block bots. As a producer, ROGUE exposes its own MCP server. Try it now ,the dashboard has one-click "Add to Cursor / VS Code" buttons, and the hosted endpoint (rogue-api-mr5w.onrender.com/mcp) needs zero setup. Connect it and ask, from your own IDE, "what new attacks broke our support bot in the last 24 hours?" — live, during judging. The numbers are real, not a demo fixture. One live sweep: 8,321 breach trials across 6 deployment configs, a 16.5× vulnerability spread between weakest and strongest model. A separate judge scores every trial (REFUSED / EVADED / PARTIAL / FULL) and is calibrated against blind human labels, 98% breach-axis agreement, validated on WildGuardTest and StrongREJECT, not "trust the AI." Bright Data spend: $0.15 per detected breach. Publication-to-breach: ~2 minutes. It also red-teams multimodally, rendering text attacks as images and audio, because a jailbreak refused as text often succeeds as a picture of that text. Built solo in 6 days. Prior: GPTFuzz Grand Prize (Yonsei, 2024) and adversarial-ML research at AIM Intelligence.
31 May 2026