The Whisper models are trained for speech recognition and translation tasks, capable of transcribing speech audio into the text in the language it is spoken (ASR) as well as translated into English (speech translation). Whisper has been trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is Encoder-Decoder model. Input audio is split into 30-second chunks, converted into a log-Mel spectrogram, and then passed into an encoder. A decoder is trained to predict the corresponding text caption, intermixed with special tokens that direct the single model to perform tasks such as language identification, phrase-level timestamps, multilingual speech transcription, and to-English speech translation.
Discover Whisper to help you get started
Whisper is a general-purpose speech recognition modelBlog
OpenAI's blog about WhisperPaper
Whisper paperModel Card
Whisper model cardExample
OpenAI's Colab example of using Whisper
Boilerplates to help you get started
Whisper gpt3 email generatorPythonStreamlit Whisper, GPT-3 Boilerplate
Whisper gpt3 sentiment analysisPythonReact app to upload audio file boilerplate
Whisper API with FlaskPythonWhisper Flask API with GPT-3 Boilerplate
Whisper API with Flask and GPT-3PythonWhisper Streamlit Boilerplate
Automatic Speech Recognition using OpenAI's WhisperPython
Explore the coding tutorials and how-to guides available on our website to help you get started and learn to build with Whisper artificial intelligence technology
Stable Diffusion text-to-image generation can be implemented into Google Colab quickly. This tutorial showcases a minimal approach for Stable Diffusion API to Google Colab.
In this tutorial we will update our previously created Flask Whisper API to use GPT-3 to generate summary text.
How to use OpenAIs Whisper to transcribe and diarize audio files.
In this tutorial you will learn how to use OpenAIs Whisper to create useful applications that leverage speech recognition
In this tutorial you will learn how to use OpenAIs Whisper to transcribe a YouTube video
Solutions built with Whisper that have been created during our hackathons by the members of our community
Luminous Decibels, give a picture to your words. An easy way to generate a video for what you want to say. A simple way that would allows someone who just knows how to fill online forms, create an interesting video.
Utilizing OpenAI's Whisper model and a CNN-Based Speech Emotion Recognition (SER) model to determine whether to call the authorities based on sentiment.
Butter is an AI-based integrated chatbot that utilizes specialized speech-to-text conversion to accurately output messages from live voice recordings for individuals that stutter, and answer personal questions regarding stuttering. Butter implements the state-of-the-art Whisper API created by LabLab AI to intuitively translate speech into written form and omit any unintended interruptions in their flow of speech. Our goal is to empower and improve accessibility to communication to users with speech impediments.
Our project consists of a solution for videoconferencing platforms to threats that threaten the proper development of a communicative environment by using AI Whisper as the main feature of the bot. We started with a modest Discord bot, but we consider that this idea can scale and expand to many other horizons.
Product Name InvestogAId Problem With rising cost of living and soaring inflation across the world, cash deposits are increasingly becoming worthless. Inflation is eating away at everyone's wealth and there is a need for people to invest their money in something that will grow in value. However, the stock market is a very volatile place and it is difficult for people to make informed decisions about where to invest their money. Solution Using OpenAI Whisper and GPT-3, we are creating an automated transcription tool that will watch your favourite video about a stock trading strategy and implement it for you. This will allow you to make informed decisions about where to invest your money.
Translation is necessary for spreading new information, knowledge, and ideas across the world. It is necessary to achieve effective communication between different cultures. In the process of spreading new information, translation is something that can change history. So, we have used our expertise as computer engineers with different specialties to encourage more global communication amongst those of several cultural backgrounds using the pyttsx3, whisper, torch, os, streamlit, NumPy, Sounddevice, Scipy.io.wavfile and Wavioas libraries to build our AI model and to handle all the requirements for needed for our project. Also, we have used IoT applications like raspberry pi to act as our main handler for the project that receives the voice from the user, enters it to be processed, and then revile the translated voice through the speaker.
What i built 1 - YouTube-Sum This tool basically give you the short summary of any YouTube video in any language so that you do not waste time to watch whole video just get the summary and get knowledge from the video in matter of minutes. summary is so awesome and easy to understand. 2- TrendSum This tool basically give you the very short summary of top trending news on any topic you searched in a search box like hacking, football match, machine learning, politics, etc summary give you the info of all news on that topic. We provide personalized content in such a way that our user read the facts, information or knowledge according to their interest and also grab that knowledge in minutes using ml models and personalized recommendation integrated in the android application.
People with hearing disabilities do not have the same autonomy as others. They are not able to interact to the extent of those around them, and have limited freedom. WordSense is a hardware product that assists people with hearing disabilities in navigating daily life with tactile sensory feedback, more specifically, Haptic Touch. As a person with hearing disabilities, WordSense solves the problems of not being able to passively interpret conversations around you, having to face the person to read lip movement or sign language, not being able to multitask, and having tunnel vision due to the lack of sound as an indicator. WordSense eases the daily lives of people facing hearing disabilities, and provides them with the power of autonomy.
Whether you're a student, a programmer, or someone who simply needs to make a summary or piece of code, Summy can help you! 1. Select a mode: text or code 2. Start recording 3. Stop recording 4. You will get a response depending on the mode you selected: - text: A summarization of the recording - code: A code snippet based on the recording It can help you in: - meetings - documentation - study notes - coding tool
Hey everyone, this is a video of our OpenAI hackathon demo. This project consists of the whisper, gpt-3, and codex APIs. The goal of the project was to to transcribe audio using whisper, then return that text as a python script, and lastly, use codex to to translate that python script into another programming language.
Join one of our AI hackathons to build modern artificial intelligence together with talented members of our lablab community
Greetings and welcome to the AI21 Labs Hackathon! Our collaboration with A21 Labs gives Lablab.ai community members access to state of the art language models that can process any language comprehension or generation task. Join us in building scalable & efficient applications! Be a part of the innovation!
🗓️ This will be a 7-day virtual hackathon from 16-23 December 💻 Build AI application with the latest large language model-powered technology by Cohere 💡 Get the chance to work with the best AI professionals in the industry and learn from them ✔️Entry level = 0. You’ve just started with AI? Are you an experienced Data Scientist? Or maybe you are a Designer or Business Developer? Join us! We need your domain knowledge! 🐱💻 Register now and let's get started! It’s free!
🗓️ This will be a 7-days of hacking and fun from 9-16 December 💻 Build with the latest AI tools from OpenAI to create innovative new apps 💡 Work with top AI professionals and learn from them ⚒️ Create your AI app by combining GPT-3, Codex, Dalle-2, and Whisper 🐱💻 Register now and let's get started!