OpenAI Whisper AI technology Top Builders

Explore the top contributors showcasing the highest number of OpenAI Whisper AI technology app submissions within our community.

OpenAI Whisper

The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.

General
Relese date	September, 2020
Author	OpenAI
Repository	https://github.com/openai/whisper
Type	general-purpose speech recognition model

Start building with Whisper

We have collected the best Whisper libraries and resources to help you get started to build with Whisper today. To see what others are building with Whisper, check out the community built Whisper Use Cases and Applications.

Tutorials

👉 Discover more Whisper Tutorials on lablab.ai

Boilerplates

Kickstart your development with a GPT-3 based boilerplate. Boilerplates is a great way to headstart when building your next project with GPT-3.

Python Whisper Boilerplate Whisper gpt3 email generator
Streamlit Whisper, GPT-3 Boilerplate Whisper gpt3 sentiment analysis
React app to upload audio file boilerplate Uploading audio file to an API endpoint
Whisper Flask API Boilerplate Whisper API with Flask
Whisper Flask API with GPT-3 Boilerplate Whisper API with Flask and GPT-3
Whisper Streamlit Boilerplate Automatic Speech Recognition using OpenAI's Whisper

Libraries

Whisper API libraries and connectors.

Whisper API Whisper API reference
OpenAI Node.js API library for Whisper The OpenAI Node.js library provides convenient access to the OpenAI API from Node.js applications
OpenAI Python Library for Whisper The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language

Edit on Github

OpenAI Whisper AI technology Hackathon projects

Discover innovative solutions crafted with OpenAI Whisper AI technology, developed by our community members during our engaging hackathons.

Generate-subtitle

This project uses the ffmpeg-python and faster-whisper libraries to create an application capable of extracting audio from an input video, transcribing the extracted audio, generating an English subtitle file based on transcription, translate the subtitles of this file into Yoruba using the Google Translate API and add it to a copy of the input video: -Extract audio track from an input video -Transcribes the audio track using the Whisper template -Generates a subtitle file in SRT format -Translates subtitles into a target language (default is Yoruba language) -Adds translated subtitles to input video as soft or hard subtitles

ÀṢÀ - Heritage Through Art

AṢA addresses the challenge of preserving and promoting local languages, often overshadowed by global languages. Many locals face difficulties expressing themselves digitally, while tourists miss out on cultural immersion. AṢA generates images from local language phrases or full sentences, enhancing visual communication. Users can learn, create custom images, and learn about the cultural significance and gain insights from an Image. Imagine you're a tourist in Benin, visiting the Ouidah Museum of History. You come across intriguing artworks and relics, wanting to learn more than what the tour guide provides. With ÀṢÀ, you can easily delve deeper into any artifact or image, expanding your understanding and enriching your experience. Its AI-driven design ensures accuracy and cultural sensitivity. For locals, it's a tool for language preservation and expression. Tourists benefit from immersive cultural experiences and meaningful interactions. AṢA bridges language barriers, promotes understanding, and celebrates linguistic diversity, enriching both locals' and tourists' experiences.

Scrypt Sync

Introducing Scrypt Sync – an innovative tool designed to bridge language barriers and enhance the accessibility of video content in real-time. Scrypt Sync empowers users to upload a video in any language and effortlessly generate accurate subtitles in another language. Our advanced translation algorithms ensure that subtitles are not only precise but also contextually relevant, preserving the essence and nuances of the original dialogue. With Scrypt Sync, viewers can enjoy a seamless and immersive experience as subtitles are displayed alongside the video, allowing for real-time translation that keeps pace with the action on screen. This feature is particularly beneficial for language learners, film enthusiasts exploring foreign cinema, and individuals requiring subtitles for better comprehension and accessibility. Scrypt Sync goes beyond mere translation by providing a user-friendly interface that supports various video formats and languages, making it versatile and adaptable to different needs. Whether you're a student, a traveler, a professional, or simply someone who loves global content, Scrypt Sync ensures you never miss a word or a moment, regardless of the language. Unlock the world of multilingual videos with Scrypt Sync, and experience the future of real-time translation technology.

Harmony

Dive into the realm of cutting-edge Python scripting with our groundbreaking project, a marvel of modern technology and linguistic prowess. Our script ingeniously integrates with YouTube, harnessing its vast repository of videos, or effortlessly processes local video files. Leveraging the Whisper model, it deftly transcribes the audio content, converting spoken words into text with remarkable accuracy and speed. But that's just the beginning of our journey. The true magic unfolds as our script seamlessly bridges language barriers, employing a sophisticated array of translation APIs including M2M100, Google Translate, and the visionary GPT4. This convergence of machine learning and natural language processing heralds a new era of communication, where boundaries dissolve and understanding transcends linguistic limitations. The culmination of this technological symphony is a masterpiece of multimedia artistry: a video adorned with dual subtitles, weaving together the original transcript and its translated counterpart. Witness the fusion of innovation and creativity as our script breathes new life into content, enabling viewers around the globe to experience and engage with media in their native language.

VOCALYTICS- Intelligence Speech Transformation

VOCALYTICS: Intelligence Speech Analytics is an industry-leading technology at the forefront of cutting-edge solutions for audio and speech processing. With a focus on innovation, their comprehensive suite of offerings encompasses Intelligence Audio and Speech Transformation, delivering advanced algorithms and techniques for enhanced audio analysis and interpretation. By leveraging AI-powered tools VOCALYTICS visualize audio signals, enabling in-depth exploration, pattern identification, and feature extraction. Our objective is to unlock valuable insights from audio data, including audio classification, sentiment analysis, speaker binarization, and acoustic feature extraction, for applications such as customer feedback analysis, voice-driven automation, and business intelligence.

Semana

Our project aims to bridge language barriers by integrating Sema API into our app, enabling automatic generation of subtitles for videos and audio content in over 200 languages. This innovative approach empowers users globally to consume and understand multimedia content in their native languages, fostering inclusivity and accessibility. By leveraging advanced AI and machine learning capabilities, we're revolutionizing the way people interact with digital media, ensuring that language diversity is no longer a hindrance but a bridge to connect individuals and communities worldwide. This initiative aligns with our vision of creating a more interconnected and inclusive digital landscape for all.