
VoiceLens is an AI-powered speech therapy and confidence coaching tool built for the IBM Bob Hackathon. It helps people identify and overcome speech challenges — from filler word overuse to pacing issues — that hold them back professionally and socially. The app records your voice directly in the browser using the Web Speech API, which transcribes your speech live in real time. The audio is then sent to a Python Flask backend where OpenAI Whisper performs accurate transcription, and Librosa extracts deep audio features including average pitch, energy variation, speaking pace, and silence ratio. A scoring engine calculates four key metrics: Fluency, Clarity, Pace, and Confidence — each shown as a visual score from 0 to 100. The system detects filler words (um, uh, like, you know), highlights them in the transcript, and identifies speech patterns with severity ratings. A Hugging Face sentiment model analyses the emotional tone of the speech. Based on severity scores, the app recommends whether professional speech therapy is advisable, and generates three personalised practice exercises tailored to each user's specific weaknesses. IBM Bob was used throughout development to scaffold code, generate documentation, and accelerate the build — turning this idea into a working prototype in hours instead of days.
17 May 2026