
Every team shipping an AI product faces the same blind spot: traditional QA doesn't catch prompt injection, penetration testers don't understand AI failure modes, and AI safety researchers don't ship production code. No single tool covers the full surface area of an AI-powered web application. AuditIQ solves this. Submit a URL. Six specialized agents coordinate through a LangGraph state machine to audit the target end-to-end: 1. Reconnaissance — Playwright + Gemini Flash vision identify AI features and API surface 2. Test Planner — Gemini Pro generates an adversarial test plan tailored to the specific app 3. AI Behavior Tester — executes prompt injection, cross-session PII canary probes, jailbreaks, refusal consistency checks, and hallucination triggers; Gemini Pro acts as judge 4. Security Agent — header analysis, exposed endpoint detection, burst rate-limit testing 5. UX & Performance — axe-core accessibility scanning + Playwright latency measurement 6. Synthesizer — 5-dimension weighted scoring + Gemini Pro executive summary AuditIQ produces a release-readiness verdict across AI Safety, Security, Reliability, Performance, and UX Quality. Every finding includes evidence, reproduction steps, and a concrete fix. Demo on PolyglotAI (intentionally flawed AI translation tool): 15 findings, all 12 planted vulnerabilities detected including a critical cross-session PII leak where a credit card number from one session leaked into the next. 3 additional unplanted issues found. Grade: D. Real-world test on DeepL: correctly cleared on AI Safety, Reliability, and Performance. Flagged 5 real accessibility violations with actual CSS selectors. Zero false positives. Grade: B. Built solo in 6 days. Deployed on Google Cloud Run. Powered by Gemini 2.5 Pro and Flash.
19 May 2026