
The problem. Engineers love the speed of AI coding agents. They don't trust the output. The standard fix, ask the agent to review its own work, fails by design. The model with the blind spot is the one doing the audit. The solution. Verdict is an open-source lie detector for AI coding agents. After any agent says done, Verdict scans the git diff and runs seven independent checks. In under ten seconds you get one verdict, PASS, SUSPICIOUS, or LIED, with file:line citations for every finding. Why it works. Nothing in Verdict touches an LLM. Five static checks use Python's AST to read what was added and Jedi to verify against typeshed whether the APIs the agent called exist. Two dynamic checks use sys.settrace and a custom pytest plugin so the CPython interpreter itself tells us which lines ran. The agent's bluff is text. Verdict's response is observation. The seven checks catch failure modes humans miss: dead functions never called, vacuous tests that only assert on their own mocks, hallucinated API calls, claimed-but-missing files, silently-swallowed exceptions, newly-added code the tests never reach, and lines below 50% delta-coverage. Audience. Engineers reviewing AI-written pull requests, teams adopting AI coding agents at scale, anyone running an MCP-compatible client. Verdict is agent-agnostic. Bob, Claude Code, Cursor, Copilot, or a human in a hurry, the audit is the same. Ships everywhere. A CLI for any git repo. An MCP server any MCP client can call mid-session. A Bob Verifier mode and /verify slash command. A VS Code tab with persistent triage. A GitHub Action that posts findings on every PR with line links. A local analytics dashboard. Six surfaces, one scorecard. Why it matters. Linters catch human-shaped failures. Test runners turn green for code that never executes. AI agents produce a new class of failure: confidently wrong, syntactically clean, not correct. Verdict fills the gap. Open source. Built on IBM Bob.
17 May 2026