AgentRx is a one-click reliability audit tool for AI agents, built during the IBM Bob Hackathon with IBM Bob as the AI development partner. The problem: developers ship AI agents to production without knowing if their outputs are reliable, their behavior is consistent, or their responses comply with organizational rules. They find out when something breaks. AgentRx solves this with three automated checks powered by the Thread Suite — nine open-source AI agent reliability tools built by Eugene Dayne Mawuli (BiteLance, Accra, Ghana): Structure Check (Iron-Thread): Validates that the agent returns well-formed, consistent output using JSON schema validation. Catches malformed responses before they reach a database. Behavior Check (TestThread): Runs three automated behavioral test cases against the live agent endpoint basic response, instruction following, and simple arithmetic. Measures pass rate and latency. Compliance Check (PolicyThread): Evaluates the agent's responses against domain-specific compliance policies for General, Medical, Finance, and Legal use cases. Catches harmful content, specific medical diagnoses, investment advice guarantees, and legal outcome promises. IBM Bob was used throughout the build to read the Thread Suite production codebases, design the integration architecture, and implement robust retry logic with exponential backoff for handling Render free tier cold starts. AgentRx returns a Reliability Score from 0 to 100 with specific failures and actionable recommendations for each check.
Category tags: