Bob is a high-performance, local-first voice assistant designed to bridge the gap between human speech and operating system automation. Built using the modern Tauri framework, Rust, and React, Bob delivers sub-second response times without the overhead of heavy native frameworks. At its core, Bob features a dual-engine architecture: Super-Low Latency Speech Recognition: Powered by Deepgram’s real-time streaming API, Bob actively listens for the wake word ('Bob', 'Bub') and continuously transcribes audio in real-time, displaying live interim feedback. Hybrid Command Parsing: Commands are analyzed using Gemini AI for complex, conversational intents, with a robust Local NLP Parser fallback. The local parser instantly strips conversational filler words ('can you please', 'so', 'but') for reliable offline execution. Bob’s capabilities span across three major categories: System & OS Automation: Launch any application (Notepad, VSCode, Chrome), navigate system folders (Documents, Downloads), manage active windows, or fetch system telemetry (disk space, battery life). Native Browser Control: Browse completely hands-free. Command Bob to open specific URLs, scroll up or down, or simulate mouse clicks directly through low-level Windows APIs. Continuous Dictation & OS Shortcuts: Use phrases like 'transcribe' or simply speak naturally—Bob's intelligent fallback automatically types dictated text character-by-character into whatever field or document has the active cursor focus. You can also trigger system hotkeys, take screenshots (via the Snipping Tool), and execute copy/paste actions purely through voice commands.
Category tags: