
When a company is breached, the clock starts on several regulators at once: NIS2 in 24 hours, DORA in 72, the SEC in four business days, the UK ICO in 72. Miss one and the fines, and personal liability, land on named officers. The obvious move is to let LLM agents draft the filings. The catch: ask a language model to compute a statutory deadline and it is wrong 25 to 74 percent of the time, and it obeys prompt injections. You cannot let a model decide what gets filed. Deadline Room splits the job. LLM drafter agents, one per regulator, research and write filings and collaborate live in a Band room. A deterministic Warden, pure Python with no model in the loop, decides and proves. It enforces every handoff through Band and guarantees three things on camera: - Exactly-once under a live kill. Kill an agent mid-filing and it reconnects and re-posts; the Warden's idempotency ledger drops the duplicate. Zero double-files across 5,000 kill schedules. - A contradiction veto. When two filings disagree on a load-bearing fact, when the incident started, the Warden blocks all signoff until they reconcile, then requires two distinct human keys. - Byte-identical replay. Every run seals to a hash-chained, Ed25519-signed log. Replay it offline and you get the same bytes, the same hash. Flip one byte and the signature breaks. A regulator verifies it on their own laptop. Thirteen regulatory regimes ship as config, with business-day clocks that know federal holidays and real EDGAR 8-K filings with inline XBRL. It runs live on Band, three multi-agent incidents proven, each replaying byte-identical, and is production-shaped: Kubernetes, Terraform, a two-key approval gate, multi-model failover, and a sovereign air-gapped mode. The hard part, exactly-once on Band, is extracted as an open library any builder can install. Over 1,435 tests, all green. The agents draft. Python proves.
19 Jun 2026

Premium brands lose millions every year to gray-market diversion: a distributor buys cheap in one country and dumps the product in another, undercutting the brand's own market. Today, brand-protection teams hear it from a vendor dashboard, a claim they cannot independently check. Amber turns that gap into evidence. It captures the same product from inside each country on Bright Data's residential network, matches the GTIN, and strips VAT to a net-of-tax floor. A within-country control runs three residential exits per country; when all three agree to the cent, the gap is a controlled experiment, not proxy noise. Every observation is sealed into a cryptographically signed, geo-attributed packet using ed25519 and an RFC 6962 Merkle tree, and anyone can verify it offline with one command. Edit a single byte and verification fails, RED. The architecture is honest by construction. Layer 1 is the deterministic signed spine: no AI ever writes a number into the evidence. Layer 2 is a separate, unsigned advisory that only reads the signed facts, a three-model jury via the AI/ML API, a Cognee temporal memory that shows whether a gap persists, and a TriggerWare workflow that turns a signed catch into an alert. A human draws any legal conclusion. We also gave back: an open pull request to Bright Data's own brightdata-mcp turns a discarded blocked-country error into a first-class signed measurement, closing their issue #104. We say what we do not claim. Requests are dispatched the same instant, not witnessed, and the annual recoverable figure uses the brand's own volume assumption, labeled as one. Every number ships inside a signed packet in the public repo with 324 passing tests, so you can clone it and re-check the proof yourself.
31 May 2026

Reef is the open-source signed supply chain for MCP servers, and the only AI firewall that outputs an underwriter-scorable evidence artifact. THE PROBLEM (April 2026): OX Security disclosed an architectural command-injection flaw in Anthropic's Model Context Protocol. 7,000+ vulnerable servers. 150M+ downstream package downloads. Every official MCP SDK affected. Anthropic did not patch the SDKs. The MCP ecosystem has no centralized signature registry today. WHAT REEF SHIPS: 1. Atlas: a Sigstore-style signed MCP registry. Unsigned binds denied at handshake with violation code MCP-RCE-26.04, single-digit-ms latency on the demo workload. 2. Lobster Trap fork: adds the 4 enforcement actions (MODIFY, REDIRECT, QUARANTINE, HUMAN_REVIEW) that Veea's upstream declared but never implemented. EchoLeak (CVE-2025-32711) blocked in 1.2 seconds. 3. DAST-A: PPO reinforcement-learning adversary that runs continuously, plus Gemini 3 Flash multimodal screenshot observer emitting structured-output policy drafts in sub-second latency. 4. Reef Quote: Gemini 3 Pro underwriter agent grounded on Munich Re's public aiSure framework (5 risk categories x 5 due-diligence axes). Produces an ed25519-signed 6-page Reef Insurance Artifact (RIA) PDF, Tier B+, premium range $42k-$54k for $5M coverage. ESTIMATED RANGE, not Munich-Re-published. RECEIPTS: 4 attack packs, 217 exfil-attempt episodes. Vanilla agent: 0% blocked. Reef-protected: 100% blocked. Reproducible via pytest. OPEN SOURCE: MIT-licensed. Built on Veea's Lobster Trap (pinned at e49a402). The 4 missing actions ship as an upstream-PR-shaped fork. LIVE: https://reef-mcp-registry.vercel.app
19 May 2026

Verdict is a pre-merge PR review tool built entirely on IBM Bob. It refuses to let history repeat itself. THE PROBLEM AI now writes 25–30% of new code at Microsoft and Google, and the share is climbing every quarter. But today's AI PR reviewers — CodeRabbit, Greptile, Copilot Review, PR-Agent — all share the same blind spot: they review the diff, not the decision. None of them join mutation testing with incident history. So when a PR adds a 5-minute cache around a function that caused a 187-minute auth outage three months ago, the diff looks fine, tests pass, CI is green — and production goes down twice. WHAT VERDICT DOES Verdict runs a 6-layer pipeline. Each layer is a Bob custom mode emitting structured JSON. At layer 6, three MCP tools fire. One of them — find_cross_layer_match — joins surviving mutations against past incidents by file path and line distance, in plain Python. No LLM in the loop. When a match exists, Bob's custom rules force the verdict-line into the synthesis output, verbatim. The most important sentence in the entire output cannot be hallucinated. The signature output: "Surviving mutation M-1 is the same code path that caused INC-2024-0431." One sentence. The entire risk. Verdict refuses to let you merge. BOB IS THE ENGINE This isn't built with Bob. Verdict IS Bob. Bob planned the pipeline in Plan mode before any code existed. Bob wrote every line of the MCP server, harness, and dashboard. At inference time, the six custom modes ARE Bob executing the pipeline. Remove Bob, no product. WHAT'S SHIPPING - Live dashboard: verdict-bob.vercel.app - OSS release: github.com/Yashash4/verdict-bob - 6 Bob custom modes, 5 skills, 2 slash commands, 3 MCP tools, custom rules, 7 exported sessions INCIDENTS.md is one source. The MCP tool swaps with PagerDuty, Linear, or Jira. The pattern surfaces wherever incident tracking and mutation testing both exist. CodeRabbit reviews the diff. Verdict reviews the decision.
17 May 2026