
RealTime Speech Transcription & Translation This project implements a real-time speech transcription and translation pipeline, exposed through a lightweight web service built with FastAPI and Uvicorn. The backend provides a set of REST endpoints for lifecycle management (load, unload, reset) and a WebSocket endpoint for continuous audio streaming and live transcription. Audio is captured directly from the browser microphone using the WebAudio API, downsampled to 16kHz mono PCM, and streamed in chunks to the backend over WebSocket. The server processes each chunk through a transcription engine and a translation pipeline, returning results in real time. The architecture supports two interchangeable transcription backends: a cloud-based path using the Speechmatics Real-Time API, and a fully local path using faster-whisper (large-v3) running on a dedicated NVIDIA RTX 3090 GPU. Translation is handled by facebook/nllb-200-1.3B, supporting Arabic, Italian, English, and French as both source and target languages. The translator is designed around a pluggable interface, making it straightforward to add new language pairs or swap the underlying model. A minimal Vue 3 single-page application provides the client interface, with live panels showing the accumulated transcription, the current in-progress line, and the raw original text before translation. Demo running in local mode only.
19 May 2026