VoiceDesk is a real-time AI-powered customer support agent that transforms how enterprises handle customer interactions. Instead of typing, customers simply speak — and VoiceDesk listens, understands, and responds intelligently in seconds. HOW IT WORKS The user clicks the microphone button in the web interface. Audio is captured at 16kHz directly in the browser and streamed via WebSocket to a FastAPI backend. The backend pipes the audio stream to the Speechmatics Real-Time API, which returns partial transcripts as the user speaks (visible live in the UI) and final transcripts when the utterance is complete. The final transcript is then sent to Claude Sonnet, which acts as an expert customer support agent and generates a concise, actionable response. The entire exchange — transcription + AI response — appears in a clean conversation log with latency metrics. KEY FEATURES - Real-time partial and final transcription powered by Speechmatics RT WebSocket API - Live audio waveform visualizer for immediate feedback - Claude Sonnet AI responses optimized for enterprise customer support tone - Session analytics: exchange count, average response latency, word count, session duration - Single-page web application — no installation required for end users - Production deployment on Vultr cloud infrastructure USE CASES VoiceDesk targets enterprise customer support teams, call centers, and any business that wants to deflect repetitive support queries with an autonomous voice agent. TECHNICAL STACK - Speechmatics Real-Time API (WebSocket, PCM float32 audio, 16kHz) - Anthropic Claude Sonnet for response generation - Vanilla JS frontend with Web Audio API for waveform rendering - Deployed on Vultr Ubuntu VM WHAT MAKES IT DIFFERENT Most voice AI demos are disconnected pipelines. VoiceDesk is a single cohesive system where speech, transcription, and AI reasoning happen in one continuous real-time flow. The result feels instant — because it is.
Category tags: