Miwa (美話, “Beautiful Conversation”) is a real-time communication overlay built for Japanese-speaking Discord voice communities. The core problem it solves is conversation breakdown during live gameplay: users currently alt-tab, copy text, and use external translators, which is too slow for real-time coordination. Miwa keeps users inside the game by rendering live translation directly in an always-on-top transparent overlay. The system captures per-speaker Discord audio streams, transcribes speech with openai-whisper, and sends an immediate fast translation packet first, followed by style-refined output from Llama 3.3 70B served with vLLM on AMD MI300X. This two-pass architecture provides instant perceived responsiveness while preserving high translation quality. Romaji is generated for pronunciation support, and users receive contextual AI reply suggestions generated by an agentic pipeline (Analyst, Strategist, Writer) with per-speaker memory retrieval via Qdrant. Miwa also includes practical in-call controls: per-member pipeline toggles, phrasebook shortcuts, quick reactions, and three reply delivery modes (Bot Speaks, Bot Sends, I’ll Speak). TTS pre-synthesis and caching reduce response delay for spoken replies. The stack combines TypeScript/React/Tauri for UI, Node.js for Discord integration, Python/FastAPI for inference orchestration, and AMD cloud GPU infrastructure for low-latency full-model inference. Built solo in 7 days, Miwa demonstrates an end-to-end agentic voice UX system with measurable latency engineering, infrastructure tradeoffs, and production-oriented interaction design for multilingual gaming communities.
Category tags: