
Klinik is a voice-native clinical AI platform that eliminates medical documentation overhead. Doctors spend 2–4 hours daily on paperwork — SOAP notes, lab orders, referral letters, billing codes, follow-up scheduling, patient notifications. Klinik reduces this to a single voice input. A doctor speaks naturally after a consultation. Eight specialized LangGraph agents fire in parallel: a transcription agent structures the speech, a clinical NLP agent extracts diagnoses, vitals, and medications using Llama 3.1 70B, then six agents run simultaneously — generating SOAP notes, ordering labs, writing referral letters, assigning ICD-10/CPT billing codes, scheduling follow-ups, and sending patient SMS notifications. A supervisor agent compiles the final summary, which Dr. Aria — a talking AI avatar — reads aloud via WebRTC. The full pipeline completes in 13.68 seconds. The AMD MI300X is central to this architecture. Llama 3.1 70B at full BF16 precision requires ~140GB VRAM — more than any other single GPU provides. Running quantized models in a clinical context introduces quality degradation that is unacceptable when a hallucinated drug interaction or missed contraindication has patient safety implications. The MI300X's 192GB HBM3 runs the full model at full precision, with 44GB remaining for KV cache. No quantization. No compromise. Observed metrics on AMD Developer Cloud (ROCm 7.2): 90% VRAM utilized (~148GB), 206W sustained power draw, 52°C junction temperature, 4.25s average LLM call latency, 13.68s end-to-end consultation time — a 2.5x improvement over sequential execution. The full stack includes LangGraph orchestration, FastAPI backend, React frontend, Deepgram STT/TTS, Simli AI avatar, LiveKit WebRTC, Turso database, Redis event bus, Twilio SMS, Caddy reverse proxy, and a complete Prometheus + Grafana monitoring stack with AMD GPU metrics.
10 May 2026