Every team shipping AI features eventually asks the same question: why did our LLM bill double last week, and what shipped that caused it? Tools like Helicone, Langfuse, and LiteLLM show you that spend went up. They do not tell you why, llmtrace closes that gap. llmtrace is a self-hosted reverse proxy for LLM provider APIs. You point your code at it instead of api.anthropic.com, and it records token usage, cost, latency, model, and a prompt fingerprint for every call into a local SQLite ledger. A rolling baseline plus sigma threshold flags per-key spend anomalies the moment they appear. When a spike is detected, a Gemini-powered agent takes over. It queries the ledger, finds deploys that landed in the surrounding time window, diffs the model and prompt mix before and after each one, and produces a causal attribution with a confidence score. It names the exact pull request responsible. If the regression is clear, the agent reads the source on GitHub, writes a corrected version, and opens a fix pull request on its own. An autonomous watcher runs this loop continuously in the background, so attributions and remediation pull requests appear without anyone asking. A multimodal Vision Import feature lets you drop in a screenshot of any billing dashboard: Gemini reads the spend off the image, then the agent investigates your connected repository to find the cause. llmtrace is written in Go, uses pure Go SQLite with no CGo, runs as a single container, and is deployed live on Google Cloud Run. The whole thing is one binary you can host yourself, with zero hosted SaaS dependency.
Category tags: