Footer navigation

Unlocking state-of-the-art artificial intelligence and building with the world's talent

LinkedIn
Twitter/X
Instagram
Discord
YouTube
Twitch

Other group brands:

Links

AI Tech
AI Hackathons
AI Tutorials
AI Applications
NativelyAI
AI Articles
Leaderboard
Writers

lablab

About
Brand
Hackathon Guidelines
Terms of Use
Code of Conduct
Privacy Policy

Get in touch

Discord
Sponsor
Cooperation
Contribute
[email protected]

© 2026 NativelyAI Inc. All rights reserved.

3.42.1

Help CenterBrowse FAQs and ask our AI.

Discord CommunityChat with mentors and the team.

Claim FREE $100

creditsAI Hackathons AI Apps AI Tech AI Tutorials AI Articles NativelyAI Sponsor

AI app: AgentRx: One-Click Reliability Audit for AI Agents for IBM Bob Hackathon

Home
App Discovery
AgentRx: One-Click Reliability Audit for AI Agents

AgentRx: One-Click Reliability Audit for AI Agents

Streamlit

Created by team BiteLance on May 15, 2026

IBM watsonx Assistant Streamlit Anthropic Claude IBM

AgentRx is a one-click reliability audit tool for AI agents, built during the IBM Bob Hackathon with IBM Bob as the AI development partner. The problem: developers ship AI agents to production without knowing if their outputs are reliable, their behavior is consistent, or their responses comply with organizational rules. They find out when something breaks. AgentRx solves this with three automated checks powered by the Thread Suite — nine open-source AI agent reliability tools built by Eugene Dayne Mawuli (BiteLance, Accra, Ghana): Structure Check (Iron-Thread): Validates that the agent returns well-formed, consistent output using JSON schema validation. Catches malformed responses before they reach a database. Behavior Check (TestThread): Runs three automated behavioral test cases against the live agent endpoint basic response, instruction following, and simple arithmetic. Measures pass rate and latency. Compliance Check (PolicyThread): Evaluates the agent's responses against domain-specific compliance policies for General, Medical, Finance, and Legal use cases. Catches harmful content, specific medical diagnoses, investment advice guarantees, and legal outcome promises. IBM Bob was used throughout the build to read the Thread Suite production codebases, design the integration architecture, and implement robust retry logic with exponential backoff for handling Render free tier cold starts. AgentRx returns a Reliability Score from 0 to 100 with specific failures and actionable recommendations for each check.

Category tags:

Agent Builder track - The INTERNET OF AGENTS, Developer Tools

Github Presentation Demo

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server

Eugene Mawuli Attigah
AI builder

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

AMD2_PKK

A clock-aware, zero-token-first routing agent. It classifies each task with no category hint, answers math, logic and code by generating a program and *executing* it

PKK

RiskOps

RiskOps is a event triggered supply chain risk simulator with a domain adaptive Multi-Agent AI System analyzes catastrophic events across your vendor network in parallel and generates structured mitigation plans. Built for AMD ACT II Hackathon (Track 3).

The Nacxmeers

GarudaLinux

Garuda Linux is an Arch-based Linux distribution known for its striking visual design, performance-focused tweaks (like BTRFS with automatic snapshots and Zen kernel), and a strong focus on gaming.

CoreX

AMD Developer Cloud

Simple Request Router

Uses Gemma 4 to classify complex vs. simple requests, and routes them to a local LLM / cloud provider as needed.

lone wizard

AMD Developer CloudAMD ROCmGemmaGemini AIAssistants API

ConsultIn

Quantivo AI (BOA) generates AI-powered Business Opportunity Analysis reports by combining local market data, sentiment analysis, and SWOT insights to help entrepreneurs validate and grow their business ideas.

Donat Madu

AI/ML APIAnthropic ClaudeClaude CodeCodexBright Data DatasetsBright Data Scraping BrowserBright Data MCP Server