SnapOn is an Android-based, offline-first multimodal AI assistant that understands what the user says and what the user sees. By combining speech, vision, and on-device reasoning, SnapOn provides fast, privacy-preserving assistance without any cloud dependency. Rather than a general-purpose chatbot, SnapOn is designed for real-world situations, identifying people and objects, summarizing documents, recognizing products and labels, and answering spoken questions about the current scene. The interaction is natural and hands-free. Hold the mic button, speak your question or say "remember this," and SnapOn captures the best camera frame, transcribes your voice using Whisper, and generates a grounded answer using SmolVLM-500M-Instruct running on the Snapdragon Hexagon NPU via ExecuTorch. What makes SnapOn unique is its personal memory layer. Say "remember this is my medication Metformin" and SnapOn saves a visual fingerprint using CLIP embeddings alongside your exact words. Next time you point the camera at the same object or person, SnapOn recognizes it passively and surfaces your saved context automatically, no button press needed. Use cases include identifying people and objects in view, summarizing documents and text in the scene, recognizing products, signs, and labels, answering spoken questions, and saving personal context for future reference. The stack includes SmolVLM-500M-Instruct, OpenAI CLIP ViT-B/32, Whisper-tiny, FAISS, SQLite, CameraX, AudioRecord, and Android TTS. On-device compilation targets SM8750 via ExecuTorch and Qualcomm QNN backend. Built for the ExecuTorch Hackathon with a strong emphasis on NPU utilization, real-world usability, and complete privacy.
Category tags: