AI coding assistants hallucinate confidently. They invent APIs that don't exist, recommend phantom packages, and — most dangerously — deny the existence of real technologies based on stale training data. We tested IBM Bob across six failure categories. The most critical finding: when asked to help upgrade Next.js 14 to Next.js 16, Bob repeatedly and confidently stated that Next.js 16 does not exist — despite it being publicly released on npm. This is the most dangerous class of AI coding failure: not making things up, but confidently denying reality. ROTAN (the Malaysian/Malay word for the rattan cane used to discipline misbehavior) is a runtime verification layer that sits between the AI assistant and the developer. For every AI response, ROTAN runs a verification pipeline: Claim Extraction — Parses the response for technology claims Live Verification — Checks each claim against npm registry, GitHub releases, and official docs in real-time Confidence Scoring — Detects when the AI sounds certain but evidence doesn't support it Gate Decision — PASS, HEDGE, CLARIFY, or BLOCK based on risk score Session Trust State — Trust degrades with failures, recovers with clean passes. The system gets appropriately skeptical over time. IBM Bob was used both as the development tool (writing the ROTAN codebase) and as the system being verified. Bob building its own accountability layer. In enterprise, confident is not the same as correct. ROTAN turns "trust and hope" into "trust and verify."
Category tags: