
You're on a call with someone speaking Japanese or German, and you're both just... getting through it. Conduit translates your voice in real time during meetings, plays it back in your own cloned voice through a virtual microphone, and captions whatever the other person says — all on your laptop, no server, no subscription, no audio leaving your machine. Setup takes about 5 minutes. You open a full-screen enrollment UI, read complex sentences out loud, say "done" after each one. The system extracts a Fourier-domain fingerprint of your voice: your pitch, formant frequencies, the shape your vocal tract makes. That fingerprint is what keeps the output sounding like you rather than a generic TTS voice. During calls, the pipeline runs in sequence: Whisper transcribes your speech, NLLB-200 translates it across any of 8 languages (English, Spanish, Japanese, German, French, Italian, Russian, Hindi), and CosyVoice2 synthesises it in your cloned voice. That audio goes into a virtual microphone. Zoom and Teams see it as your mic. The other person hears you, in their language. Their side works in parallel. System audio gets captured, language auto-detected, translated, and shown as captions in an overlay that sits above your taskbar on Windows and above the dock on macOS — permanently, without blocking anything. The voice model gets better over time. When the GPU sits idle, a LoRA adapter trains quietly on your accumulated recordings, updating less than 0.5% of the TTS model's weights. After a few weeks it noticeably tightens. If local hardware isn't enough, training moves to Google Colab — you get a notebook, you run it, you paste the Drive link back. On an RTX 3050: ~3.6 GB VRAM total. Thermal monitoring cuts TTS if it runs hot. Once the models are downloaded, it works offline.
17 May 2026