Browse applications built on OpenAI Whisper technology. Explore PoC and MVP applications created by our community and discover innovative use cases for OpenAI Whisper technology.
small python web service and vue app to enable realtime translation/transcription of streamed audio
Dasalt 360 is a high volume IT company, which buy laptops from USA and sell it from Jimeta, Adamawa State to another place. I built a multi-modal AI Agent to help procurement planning, logistics, sells and operational challenges.
Meridian turns any video into a queryable knowledge base. Upload a video, ask a question in plain English, and get a precise timestamped answer drawn from speech, on screen text, and what was visually happening at that exact moment.
MeetIQ is an AI-powered meeting intelligence platform that converts meeting audio into searchable transcripts, AI-generated summaries, action items, speaker recognition insights, and chatbot-assisted knowledge retrieval.
AI-powered recruitment platform that automates hiring from CV upload to interview scheduling. Recruiters can create jobs, score candidates with AI, generate interview questions, manage interviews, and streamline the entire hiring pipeline in one place.
Iroko AI deploys 5 specialized AI agents that monitor MTN Nigeria's enterprise documents 24/7, detect SLA breaches, compliance risks, and network anomalies in real-time — streaming every reasoning step live so operators see exactly how decisions are made.
Ken hears every word. You understand every one. Ken is a free, self-hostable, open-source AI co-listener that gives real-time, explainable, personalized support during immigration, legal, and medical conversations — built on AMD ROCm.
A real-time shared commitment and memory layer for meetings
CogniCore transforms meetings into production-ready workflows using AI. It captures discussions, extracts decisions and tasks, then orchestrates development, governance, and execution automatically in one unified platform.
A human-in-the-loop annotation platform for African languages on AMD MI300X. Native speakers of Sudanese Arabic, Fur, Zaghawa & Dinka correct AI outputs via text or voice to build sovereign African language datasets for RLHF training.
A vision assistant that learns its user. Live Sight runs Qwen3-VL-8B on AMD MI300X with on-device LoRA fine-tuning, voice corrections in Roman Urdu and English, and episodic recall over what you've seen. Built for blind and low-vision users.
Skribe is a voice-powered AI medical intake agent that interviews patients before their doctor visit, asks clinically grounded follow-up questions using NIH data, and generates a shareable physician-ready PDF report.
AgroFamiliApp: AI conversational agent (voice/text) for Brazilian family farmers, powered by AMD MI300X + Llama 3.1. Access rural credit, weather, markets, and agroecological technical assistance via WhatsApp, Telegram, or WebApp.
HexySAR is an AI-powered autonomous hexapod for cave search-and-rescue. It explores hazardous terrain, detects survivors through vision and audio, maps safer paths, and sends rescue intelligence before human teams enter.
A modular autonomous AI agent inspired by the human brain. It features specialized LLMs for communication, coding, vision, and memory, with a Sims-inspired autonomy system of evolving traits, desires, dreams, and moods — running on AMD MI300X via ROCm.
"CrowdSense is a real-time multimodal safety system that watches crowd video and audio simultaneously, infers panic before it becomes a stampede, and tells security exactly what to do"
Helps download and summarize massive amount of instagram posts in one go
A fully-local multimodal video analysis pipeline that transforms raw video into structured entities, events, and timelines using YOLO26, Whisper.cpp, Depth Anything V2, and Qwen3.5-VL — all running on consumer AMD hardware. No cloud, no API keys.
Drop a 3-hour video. Get a timestamped intelligence dossier with speaker claims, topic maps, and highlight clips. Powered by Qwen3-VL-32B-Thinking + VibeVoice ASR on AMD MI300X at 0.15× real-time.
Edge-AI clinical scribe for Indian healthcare Listen → Translate → Structure → Edit → Save
Multimodal AI highlight clipper that turns long videos into editable short vertical clips with AI-selected moments, subtitles, HRE effects, and a final human review editor powered by AMD GPU Cloud.
An industrial safety command system that turns cameras, sensors, voice reports, operator notes and different kinds of real time data sources into traceable incident insights, risk zones, and response actions for faster, safer decisions.
Tammy the Tattle Turtle is an AI-powered emotional support robot for elementary students. Built on Reachy Mini (simulated), Google Gemini, and Hugging Face, Tammy listens to kids' feelings, triages urgency (GREEN/YELLOW/RED)
CareSight AI automates healthcare intake and screening using computer vision, OCR, speech recognition, and LLM reasoning to generate structured clinical summaries for nurses and doctors.
Simulation-first robotic system performing structured physical tasks such as pick-and-place, sorting, and simple assembly. Designed for repeatable execution under varied conditions, environmental interaction, and measurable performance metrics.
ChartSeek uses CLIP visual AI + Whisper to search trading videos by describing chart patterns, candlesticks, or indicators—not just spoken words. Ask 'double bottom pattern' and instantly find that moment. 100% local, open-source, zero API costs.
Qubic Liquidation Guardian is a real-time liquidation risk monitoring and automation system for Qubic’s Nostromo protocols. It scores borrower risk, predicts liquidation events, and triggers EasyConnect alerts to protect users and stabilize the ecosystem.
3 agents powered by granite 8b and orchestrated by IBM Orchestrater Solving real world healthcare morale failure
BridgeYield is a financial wellness platform that uses an AI-powered multi-agent architecture. It features an AI Financial Advisor (powered by the Google Gemini API ), an Emotion Companion, and a voice-first interface using STT and TTS technology
Our app automatically translates any video into your desired language and adds accurate subtitles, making global content accessible to everyone. Powered by GPT-5 for high-quality translation and Whisper AI for precise speech-to-text.