LeakGuard: Catch Leaks Before Google Does

Streamlit
application badge
Created by team Leakguard on May 29, 2026
Security & Compliance

Secrets leak to paste sites every minute — API keys, database URLs, customer PII — and most of them sit there until Google indexes them and the abuse begins. LeakGuard is an autonomous agent that closes that window. How it works. A LangGraph pipeline runs six nodes end-to-end: Discovery uses Bright Data's SERP API (with the brd_json=1 parsing flag) to issue Google dorks against a hot-reloading watchlist; Extraction pulls raw paste content through Bright Data's Web Unlocker, bypassing the bot walls that block naive scrapers; a local regex Triage drops the obvious noise cheaply; an Analyst (Claude Sonnet, recall-tuned) flags anything that could be a leak; a Judge (Claude Sonnet, temperature 0, three-axis rubric) only escalates with a score ≥ 8; and Alert ships a redacted Slack notification with the audit reasoning attached. Why Bright Data is load-bearing. Paste-site discovery without SERP access is guessing; Web Unlocker is what makes xtraction actually work past Cloudflare and rate limits. Both zones are real and validated end-to-end. Safety. Credentials are redacted in two layers before any alert leaves the box. LangSmith tracing is off by default — it would otherwise ship the exact secrets the agent exists to catch to a third-party log store. A pre-commit detect-secrets hook guards the repo itself. What's built. Real pipeline (not stubs), per-node tests + smoke test, a Day-3 eval set of seeded pastes for regex tuning, a Streamlit dashboard reading the JSONL audit log, ADRs for the load-bearing decisions, and a synthetic mock server so demos don't burn the $250 SERP credit cap.

Category tags: