OpenAI Whisper tutorial: How to use Whisper to transcribe a YouTube video

Thursday, October 06, 2022byezzcodeezzlife

Unraveling Whisper: OpenAI's Premier Speech Recognition System

OpenAI Whisper emerges as OpenAI's state-of-the-art speech recognition solution, meticulously trained with 680,000 hours of web-sourced multilingual and multitask data. This extensive dataset bolsters increased resistance to accents, ambient noise, and technical jargon. Additionally, it supports transcribing in numerous languages and translating them into English. Distinct from DALLE-2 and GPT-3, Whisper is a free and open-source model. OpenAI delivers access to its models and codes, fostering the creation of valuable speech recognition applications.

Mastering YouTube Video Transcription with Whisper

Throughout this Whisper tutorial, you'll gain expertise in utilizing Whisper to transcribe a YouTube video. We'll employ the Python package "Pytube" to download and convert the audio into an MP4 file. Visit Pytube's repository for more information.

First, we need to install the Pytube Library. You can do this by running the following command in your terminal:

!pip install -— upgrade pytube

For this tutorial I'll be using this "Python in 100 Seconds" Video.

Next, we need to import Pytube, provide the link to the YouTube video, and convert the audio to MP4:

#Importing Pytube library
import pytube

# Reading the YouTube link
video = "https://www.youtube.com/watch?v=x7X9w_GIm1s"
data = pytube.YouTube(video)

# Converting and downloading as 'MP4' file
audio = data.streams.get_audio_only()
audio.download()

The output is a file named like the video title in your current directory. In our case, the file is named Python in 100 Seconds.mp4 Now, the next step is to convert audio into text. We can do this in three lines of code using whisper. First, we install and import whisper. Then we load the model and finally we transcribe the audio file.

Installing Whisper libary

!pip install git+https://github.com/openai/whisper.git -q

import whisper

Load the model. We'll use the "base" model for this tutorial. You can find more information about the models here. Each one of them has tradeoffs between accuracy and speed (compute needed).

model = whisper.load_model("base")
text = model.transcribe("Python in 100 Seconds.mp4")

And now we can print out the output.

#printing the transcribe
text['text']

You can find the full code as Jupyter Notebook

Let's learn more!

Your AI journey doesn't have to end here - visit other our AI tutorials and learn more! And why not to test your new skills during our upcoming AI Hackathons? You will build an AI app, meet other like-minded people from all around the world and upgrade your skills in just couple of days. Idea worth considering!

Thank you for reading. - Fabian Stehle, Junior Data Scientist at New Native