Verdict is a pre-merge PR review tool built entirely on IBM Bob. It refuses to let history repeat itself. THE PROBLEM AI now writes 25–30% of new code at Microsoft and Google, and the share is climbing every quarter. But today's AI PR reviewers — CodeRabbit, Greptile, Copilot Review, PR-Agent — all share the same blind spot: they review the diff, not the decision. None of them join mutation testing with incident history. So when a PR adds a 5-minute cache around a function that caused a 187-minute auth outage three months ago, the diff looks fine, tests pass, CI is green — and production goes down twice. WHAT VERDICT DOES Verdict runs a 6-layer pipeline. Each layer is a Bob custom mode emitting structured JSON. At layer 6, three MCP tools fire. One of them — find_cross_layer_match — joins surviving mutations against past incidents by file path and line distance, in plain Python. No LLM in the loop. When a match exists, Bob's custom rules force the verdict-line into the synthesis output, verbatim. The most important sentence in the entire output cannot be hallucinated. The signature output: "Surviving mutation M-1 is the same code path that caused INC-2024-0431." One sentence. The entire risk. Verdict refuses to let you merge. BOB IS THE ENGINE This isn't built with Bob. Verdict IS Bob. Bob planned the pipeline in Plan mode before any code existed. Bob wrote every line of the MCP server, harness, and dashboard. At inference time, the six custom modes ARE Bob executing the pipeline. Remove Bob, no product. WHAT'S SHIPPING - Live dashboard: verdict-bob.vercel.app - OSS release: github.com/Yashash4/verdict-bob - 6 Bob custom modes, 5 skills, 2 slash commands, 3 MCP tools, custom rules, 7 exported sessions INCIDENTS.md is one source. The MCP tool swaps with PagerDuty, Linear, or Jira. The pattern surfaces wherever incident tracking and mutation testing both exist. CodeRabbit reviews the diff. Verdict reviews the decision.
Category tags: