OpenAI Whisper AI technology Top Builders

Explore the top contributors showcasing the highest number of OpenAI Whisper AI technology app submissions within our community.

OpenAI Whisper

The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

General
Relese dateSeptember, 2020
AuthorOpenAI
Repositoryhttps://github.com/openai/whisper
Typegeneral-purpose speech recognition model

Start building with Whisper

We have collected the best Whisper libraries and resources to help you get started to build with Whisper today. To see what others are building with Whisper, check out the community built Whisper Use Cases and Applications.

Tutorials

Boilerplates

Kickstart your development with a GPT-3 based boilerplate. Boilerplates is a great way to headstart when building your next project with GPT-3.


Libraries

Whisper API libraries and connectors.


OpenAI Whisper AI technology Hackathon projects

Discover innovative solutions crafted with OpenAI Whisper AI technology, developed by our community members during our engaging hackathons.

Cheap and accurate video translation for masses

Cheap and accurate video translation for masses

VideoLangua: Bridging the Language Divide through AI-Powered Video Translation English dominates the world of online content, with more than 60% of digital content in English. This bias also reflects in large language model training, where English remains the primary language. Yet, 75% of the world's population does not speak English, creating a significant information divide. For non-English speakers, the cost of translation is prohibitive—for instance, professional video subtitling can range from $0.18 to $0.25 per word, according to Marshub. VideoLangua addresses this challenge by offering cheap and accurate AI translation services to bridge language barriers, making information more accessible to non-English speakers. Our solution leverages a suite of large language models and specialized agents for efficient, culturally sensitive, and accurate video translation: 1. Speech-to-Text Conversion: We use Whisper for accurate transcription, including timestamps. 2. Translation: For translation, we call Llama 3.1-70B-Chinese and Llama 3.1 70B to translate text. To ensure translation accuracy, Llama 3.2 3B, which supports eight languages, reviews the translation. 3. Subtitle Generation: Translated text is synced back to the video’s timeline as subtitles. 4. Optional Dubbing: For users who need voiceover in their native language, we offer an option for dubbing, ideal for video content files. With the global translation service market valued at $40.95 billion in 2023 and projected to reach $41.78 billion in 2024, VideoLangua is well-positioned to capture a share by offering a high-quality, affordable alternative to traditional video translation services. Unlike typical text-based translation tools, VideoLangua focuses on video and aduio translation, combining affordability with quality. Our solution uses Llama fine-tuned for different languages, allowing users to easily access translation and dubbing features in multiple languages.

LinguaPolis - Bridging Languages and Uniting Teams

LinguaPolis - Bridging Languages and Uniting Teams

Today we have workforces spread across multiple countries, cultures, and languages. Online meetings have become a key part of how businesses operate. But there's a significant challenge: not everyone speaks the same language or communicates effectively in a common one like English. Accents, dialects, or simply varying levels of proficiency often make it difficult for teams to fully understand each other. This problem results in inefficiencies—important points are missed, and post-meeting emails or clarifications are needed, which undermines the very purpose of the meeting. Ultimately, this wastes time, reduces productivity, and slows down decision-making. Consider a typical online meeting scenario. For many, English may not be their first language. Some struggle to express their ideas clearly, while others have difficulty understanding certain accents or pronunciations. Often, by the end of the meeting, there’s a lack of clarity, and participants need to follow up with additional communications, re-explaining points or providing more instructions. Now imagine a scenario where each participant could simply speak in their native language, and everyone else could understand them perfectly—in real time, without the need for human translators or extensive subtitles. This is where our idea comes in: an AI-powered online meeting platform that provides real-time dubbing and multilingual transcription. Each participant can speak comfortably in their native language, and the platform instantly dubs their speech into the local languages of all other participants. Additionally, real-time text transcripts of the conversation are generated in each participant’s language, providing both auditory and visual clarity. At the end of the meeting, the platform even summarizes the discussion, producing meeting minutes in every participant's native language, ensuring that everyone walks away with a clear understanding of what was discussed and the decisions that were made.