Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

OpenAI Whisper

The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

General
Relese dateSeptember, 2020
AuthorOpenAI
Repositoryhttps://github.com/openai/whisper
Typegeneral-purpose speech recognition model

Start building with Whisper

We have collected the best Whisper libraries and resources to help you get started to build with Whisper today. To see what others are building with Whisper, check out the community built Whisper Use Cases and Applications.

Tutorials

Boilerplates

Kickstart your development with a GPT-3 based boilerplate. Boilerplates is a great way to headstart when building your next project with GPT-3.


Libraries

Whisper API libraries and connectors.


OpenAI Whisper AI technology Hackathon projects

Discover innovative solutions crafted with OpenAI Whisper AI technology, developed by our community members during our engaging hackathons.

Dasalt 360 Enterprise AI

Dasalt 360 Enterprise AI

Dasalt 360 Ltd is a sophisticated, multimodal multi-agent enterprise system engineered specifically to navigate the intricate complexities of the West African IT hardware market. Operating from its headquarters in Jimeta, Adamawa State, Nigeria, the system serves as an autonomous "System of Record" that bridges the strategic gap between global hardware procurement in the United States, national wholesale hubs in Lagos, and local retail distribution in Yola North. The architecture is meticulously built upon Vultr’s Dedicated Cloud Compute infrastructure, ensuring secure, low-latency orchestration of business logic. The "Intelligence Layer" is powered by Vultr Serverless Inference, utilizing Llama 3.1 70B for high-precision financial reasoning and Llama 3.2 Vision for the optical identification of hardware assets. A primary innovation of this project is its ability to handle the extreme currency volatility of the Nigerian Naira (₦). The AI acts as an autonomous Chief Financial Officer (CFO), performing real-time landing cost calculations based on a fixed 1,400 NGN/USD exchange rate, while simultaneously integrating localized logistics overheads—specifically the 7,000 Naira per-unit secure transit fee from Lagos to Yola. Beyond text-based automation, the system demonstrates the "Future of Work" through a multimodal interface. It incorporates a bespoke Voice-to-Voice engine and Vision-to-Text capabilities, allowing CEO Christopher Krim and his team to manage inventory hands-free in the warehouse or via photographic evidence of arrivals. To maintain the highest corporate standards, the system is instructed to curate every response in elegant, sophisticated natural language, providing polished executive briefs that eliminate technical jargon. By centralizing planning, coordination, and execution on Vultr, Dasalt 360 Ltd exemplifies a new era of autonomous enterprise operations, ensuring business sustainability and profitability in a challenging economic environment.

Meridian

Meridian

Video is one of the most information-dense formats that exists, but almost none of that information is actually accessible after the recording ends. You can scrub through a timeline, you can search a transcript if one exists, but if the answer to your question was written on a whiteboard, shown on a slide, or explained through a gesture without being spoken aloud, you simply cannot find it. Meridian was built to fix that. When you upload a video, Meridian processes it through three completely independent channels simultaneously. It transcribes everything that was spoken with precise word-level timestamps. It reads every frame for on-screen text, picking up slides, diagrams, formulas, and annotations. And it generates a full natural language description of the visual scene in every frame, capturing what the speaker is doing, pointing at, or drawing. When you ask a question, Meridian searches across all three of those knowledge stores at once and uses Gemini 2.5 Pro as an AI reasoning agent to identify the single moment in the video that best answers what you asked. The video seeks to that timestamp automatically. You also see exactly which source fired the strongest signal for that answer, whether it was the spoken word, the on-screen text, or the visual scene, so you can trust the result. The target audience is anyone who works with recorded video as a knowledge source: engineering teams reviewing architecture discussions, legal and compliance teams indexing regulatory training libraries, researchers cataloguing interview recordings, or enterprise teams making internal video knowledge bases actually searchable. Meridian makes the full content of any video as accessible as a well-indexed document.

MeetIQ

MeetIQ

MeetIQ is an AI-powered meeting intelligence platform developed by Team Semicolon for the IBM BOB Hackathon. The platform is designed to transform ordinary meetings into structured, searchable, and actionable insights using modern AI technologies. Users can upload meeting audio in multiple formats such as MP3, WAV, M4A, and OGG. MeetIQ automatically transcribes the audio into searchable text using advanced speech-to-text AI models. The generated transcript can then be analyzed to extract summaries, discussion topics, sentiment insights, and action items. One of the key features of MeetIQ is its AI-powered chatbot assistant. Instead of manually revisiting lengthy recordings or transcripts, users can ask questions related to the meeting and receive context-aware answers instantly using Retrieval-Augmented Generation (RAG) techniques. The platform also includes speaker recognition and voice-to-person mapping functionality. Users can associate voices with names, allowing MeetIQ to identify recurring speakers in future meetings. Unknown speakers can also be manually mapped and stored for later recognition. MeetIQ supports export functionality in multiple formats including PDF, TXT, and JSON, making it easy to share meeting insights and documentation. The system also supports user authentication and persistent cloud-based sessions for better accessibility and collaboration. By automating transcription, summarization, task extraction, and intelligent retrieval, MeetIQ helps teams save time, improve productivity, and preserve valuable meeting knowledge efficiently.