.png&w=640&q=75)

Every developer asks "why is this code written like this?" five times a day. Existing AI tools explain what code does and happily rewrite it — but none answer the question that actually matters: what was the author trying to solve, and what will break if you "clean it up"? code-archaeologist is a two-stage AI agent that digs through git blame, commit diffs, pull request discussions, issue tracker references, and related test changes to produce a structured, cited answer in seconds. It tells you the code's intent, how it evolved over time, what risks exist if you refactor, and backs every single claim with a real commit SHA, PR number, or issue link. The architecture splits investigation from synthesis. A bounded ReAct loop (Investigator) gathers evidence using 8 git, GitHub, and code-search tools — seeding the most valuable calls deterministically before the LLM even runs. A JSON-mode Synthesizer then produces the final structured answer. A deterministic CitationGuard post-validator prunes any claim whose source doesn't exist in the evidence ledger. If a commit SHA isn't in git log, the claim is gone. Hallucinated archaeology is not archaeology. The LLM backend is pluggable: ships with Google Gemini (free, default) and IBM watsonx Granite. Adding a new provider is one file implementing an 80-line Protocol. Without any API key, the tool still produces a deterministic report sourced purely from git evidence — every SHA is real and verified. Three ways to use it: a VS Code extension (right-click any code), a CLI for scripting and CI, and an MCP server for integration with Claude Code, Cursor, Zed, or Windsurf. All three share the same orchestrator and produce the same cited Answer. Benchmarked against Flask's 678-commit repo: 5.8 seconds for a 10-line investigation, high confidence, zero CitationGuard drops. 68+ tests including real Gemini integration tests where every output SHA is verified against git log.
17 May 2026