OpenAI Whisper AI technology Top Builders

Explore the top contributors showcasing the highest number of OpenAI Whisper AI technology app submissions within our community.

OpenAI Whisper

The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

General
Relese dateSeptember, 2020
AuthorOpenAI
Repositoryhttps://github.com/openai/whisper
Typegeneral-purpose speech recognition model

Start building with Whisper

We have collected the best Whisper libraries and resources to help you get started to build with Whisper today. To see what others are building with Whisper, check out the community built Whisper Use Cases and Applications.

Tutorials

Boilerplates

Kickstart your development with a GPT-3 based boilerplate. Boilerplates is a great way to headstart when building your next project with GPT-3.


Libraries

Whisper API libraries and connectors.


OpenAI Whisper AI technology Hackathon projects

Discover innovative solutions crafted with OpenAI Whisper AI technology, developed by our community members during our engaging hackathons.

LinguaPolis - Bridging Languages and Uniting Teams

LinguaPolis - Bridging Languages and Uniting Teams

Today we have workforces spread across multiple countries, cultures, and languages. Online meetings have become a key part of how businesses operate. But there's a significant challenge: not everyone speaks the same language or communicates effectively in a common one like English. Accents, dialects, or simply varying levels of proficiency often make it difficult for teams to fully understand each other. This problem results in inefficiencies—important points are missed, and post-meeting emails or clarifications are needed, which undermines the very purpose of the meeting. Ultimately, this wastes time, reduces productivity, and slows down decision-making. Consider a typical online meeting scenario. For many, English may not be their first language. Some struggle to express their ideas clearly, while others have difficulty understanding certain accents or pronunciations. Often, by the end of the meeting, there’s a lack of clarity, and participants need to follow up with additional communications, re-explaining points or providing more instructions. Now imagine a scenario where each participant could simply speak in their native language, and everyone else could understand them perfectly—in real time, without the need for human translators or extensive subtitles. This is where our idea comes in: an AI-powered online meeting platform that provides real-time dubbing and multilingual transcription. Each participant can speak comfortably in their native language, and the platform instantly dubs their speech into the local languages of all other participants. Additionally, real-time text transcripts of the conversation are generated in each participant’s language, providing both auditory and visual clarity. At the end of the meeting, the platform even summarizes the discussion, producing meeting minutes in every participant's native language, ensuring that everyone walks away with a clear understanding of what was discussed and the decisions that were made.