Footer navigation

Unlocking state-of-the-art artificial intelligence and building with the world's talent

LinkedIn
Twitter/X
Instagram
Discord
YouTube
Twitch

Other group brands:

Links

AI Tech
AI Hackathons
AI Tutorials
AI Applications
NativelyAI
AI Articles
Leaderboard
Writers

lablab

About
Brand
Hackathon Guidelines
Terms of Use
Code of Conduct
Privacy Policy

Get in touch

Discord
Sponsor
Cooperation
Contribute
[email protected]

© 2026 NativelyAI Inc. All rights reserved.

3.42.1

Help CenterBrowse FAQs and ask our AI.

Discord CommunityChat with mentors and the team.

Claim FREE $100

creditsAI Hackathons AI Apps AI Tech AI Tutorials AI Articles NativelyAI Sponsor

AI app: Benchmarking Robustness in Agentic RAG Systems for AI Agent Olympics Hackathon

Home
App Discovery
Benchmarking Robustness in Agentic RAG Systems

Benchmarking Robustness in Agentic RAG Systems

Streamlit

Created by team BhanguJatt on May 18, 2026

LangChainHuggingFace Hub Groq Llama 3.1 AI/ML API ChatGPT Chroma OpenGPTs Streamlit

Collaborative SystemsMultimodal Intelligence

As AI agents become increasingly integrated into real-world applications, understanding retrieval reliability and preprocessing sensitivity has become a major challenge in Retrieval-Augmented Generation (RAG) systems. Most traditional evaluations focus only on architecture performance while ignoring how preprocessing decisions can significantly affect retrieval robustness and benchmark outcomes. In this project, we built an interactive observability and benchmarking platform for evaluating robustness in Agentic RAG systems. The platform compares Single-Agent and Multi-Agent RAG architectures across SQuAD and HotpotQA benchmarks using Exact Match (EM) and F1 evaluation metrics. Through systematic experiments, we discovered a key insight: preprocessing strategies such as chunking can completely flip benchmark winners. Without chunking, the Single-Agent system slightly outperformed the Multi-Agent system on SQuAD. However, after introducing chunking, the Multi-Agent architecture became significantly more robust under noisy retrieval conditions. To make these behaviors observable, we developed an interactive Streamlit dashboard featuring benchmark comparison analytics, retrieval trace visualization, chunking impact analysis, and failure inspection. One of the core components of the platform is the Retrieval Trace Viewer, which allows users to inspect how Multi-Agent systems rewrite queries, retrieve semantically richer chunks, and improve answer generation step-by-step. We also analyzed common RAG failure modes such as vocabulary mismatch, retrieval pollution, and chunk fragmentation. Our findings demonstrate that retrieval robustness depends not only on architecture design but also heavily on preprocessing strategy and retrieval quality. Technologies used include LangChain, LangGraph, FAISS, HuggingFace Embeddings, Groq LLMs, Streamlit, Plotly, and Python.

Category tags:

Agent Builder track - The INTERNET OF AGENTS

Github Presentation Demo

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server

Harman Bhangu

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server