
Hermes is an Android accessibility and translation assistant designed for real-time communication across text, camera, and voice workflows. The app is built in Kotlin with Android native UI components, CameraX for live camera preview, ML Kit Text Recognition for on-device OCR, Android SpeechRecognizer for fast speech-to-text, and ExecuTorch/QNN integration for experimenting with local AI inference using whisper and Qwen2.5-0.5B and Qwen3-7B and on Qualcomm-powered devices. Its camera mode lets a user point the phone at signs, notes, labels, forms, menus, or other printed text, recognize the text on-device, identify the source language, and translate it into English. Its voice mode supports quick speech transcription so spoken input can be converted into readable text during live interaction. The project is especially useful in real-world situations such as travel, hospitals, classrooms, public offices, accessibility support, and multilingual conversations where users need quick understanding without switching between multiple apps. Hermes also includes groundwork for local model execution using Whisper-style ASR assets, Qwen translation models, Android native libraries, and Qualcomm acceleration paths, making it a practical prototype for privacy-aware, low-latency mobile AI translation.
Category tags: