1
1
10+ years of experience

AutoRAG Research Lab is a platform for automatically improving enterprise RAG systems through continuous experimentation. Inspired by Andrej Karpathy’s “autoresearch” mindset, the platform operates in a Karpathy-style mode: autonomous agents generate ideas, test them against a benchmark, learn from failures, and iteratively improve the RAG pipeline with minimal manual intervention. Instead of manually tuning retrieval settings, it uses a multi-agent workflow to propose hypotheses, generate new retrieval configurations, run head-to-head benchmark evaluations, and promote only the changes that measurably improve performance. The project is built around EnterpriseRAG-Bench, a realistic benchmark designed to simulate enterprise knowledge bases with large-scale internal documents and challenging retrieval tasks. Our system can evaluate dense and hybrid retrieval strategies, test reranking, compare score deltas against a baseline, and surface the strongest configurations through an experiment leaderboard. On top of the research engine, we built a full product experience: a web dashboard to launch and monitor runs, inspect hypotheses, review score progression, and analyze failure cases at the question level. Users can also choose a benchmark focus, upload their own dataset, and test the current retrieval stack through a ChatGPT-style RAG playground that shows both answers and retrieved source documents. The result is a practical lab for turning RAG optimization into a repeatable, data-driven process: faster iteration, clearer evaluation, and a monotonic path toward better enterprise search and answer quality.
19 May 2026