PHI Compliance Scanner is a repo-level PHI (Protected Health Information) scanner built for the IBM Bob Hackathon. It targets a realistic but fully synthetic EHR demo repo with three Python microservices (patient-api, billing-service, audit-logger) and deliberately injected HIPAA-style violations such as logging SSNs, propagating patient identifiers between services, and storing unmasked fields. Instead of hand-writing all the tooling, I used IBM Bob IDE as the primary development partner. Bob read the PRD and .bob/rules.md file, which defines the 18 HIPAA identifiers and scanning rules, then helped plan and implement the scanner/ modules, the PHI flow mapping, the markdown/JSON audit report format, and pytest test stubs. All major steps – planning the synthetic repo, building the scanner engine, generating tests, and doing the final review – are captured as exported Bob task sessions under bob_sessions/. The result is a single-command workflow: python scanner/phi_scanner.py --repo ./ehr-demo --output ./reports/audit-report.md The scanner walks the entire repo, detects and classifies hundreds of PHI violations by service, file and identifier type, builds a PHI flow graph across services, and generates an audit-ready markdown report plus a structured JSON file. Today the focus is fast, accurate detection and reporting; auto-patching, watsonx.ai-based classification, CI/CD integration and dashboards are explicitly designed as roadmap steps.
Category tags: