OpenAI Whisper tutorial: Creating OpenAI Whisper API in a Docker Container

Thursday, October 06, 2022 by Flafi

Discover Whisper: OpenAI's Premier Speech Recognition System

Whisper is a groundbreaking speech recognition system by OpenAI, expertly crafted from 680,000 hours of web-sourced multilingual and multitask data. This expansive dataset empowers Whisper with unparalleled resilience to accents, background noise, and technical jargon. It also supports transcription in numerous languages, facilitating seamless translations into English. OpenAI provides access to Whisper models and codes, serving as a solid foundation for ingenious developers to build ingenious OpenAI Whisper applications and propel the speech recognition domain to new heights.

How to start with Docker

First of all if you are planning to run the container on your local machine you need to have Docker installed. You can find the installation instructions here.
Creating a folder for our files, lets call it whisper-api
Create a file called requirements.txt and add flask to it.
Create a file called Dockerfile

In the Dockerfile we will add the following lines:

FROM python:3.10-slim

WORKDIR /python-docker

COPY requirements.txt requirements.txt
RUN apt-get update && apt-get install git -y
RUN pip3 install -r requirements.txt
RUN pip3 install "git+https://github.com/openai/whisper.git" 
RUN apt-get install -y ffmpeg

COPY . .

EXPOSE 5000

CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]

So what is happening exactly in the Dockerfile?

Choosing a python 3.10 slim image as our base image.
Creating a working directory called python-docker
Copying our requirements.txt file to the working directory
Updating the apt package manager and installing git
Installing the requirements from the requirements.txt file
installing the whisper package from github.
Installing ffmpeg
And exposing port 5000 and running the flask server.

How to create our route

Create a file called app.py where we import all the necessary packages and initialize the flask app and whisper.
Add the following lines to the file:

from flask import Flask, abort, request
from tempfile import NamedTemporaryFile
import whisper
import torch

# Check if NVIDIA GPU is available
torch.cuda.is_available()
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load the Whisper model:
model = whisper.load_model("base", device=DEVICE)

app = Flask(__name__)

Now we need to create a route that will accept a post request with a file in it.
Add the following lines to the app.py file:

@app.route("/")
def hello():
    return "Whisper Hello World!"


@app.route('/whisper', methods=['POST'])
def handler():
    if not request.files:
        # If the user didn't submit any files, return a 400 (Bad Request) error.
        abort(400)

    # For each file, let's store the results in a list of dictionaries.
    results = []

    # Loop over every file that the user submitted.
    for filename, handle in request.files.items():
        # Create a temporary file.
        # The location of the temporary file is available in `temp.name`.
        temp = NamedTemporaryFile()
        # Write the user's uploaded file to the temporary file.
        # The file will get deleted when it drops out of scope.
        handle.save(temp)
        # Let's get the transcript of the temporary file.
        result = model.transcribe(temp.name)
        # Now we can store the result object for this file.
        results.append({
            'filename': filename,
            'transcript': result['text'],
        })

    # This will be automatically converted to JSON.
    return {'results': results}

How to run the container?

Open a terminal and navigate to the folder where you created the files.
Run the following command to build the container:

docker build -t whisper-api .

Run the following command to run the container:

docker run -p 5000:5000 whisper-api

How to test the API?

You can test the API by sending a POST request to the route http://localhost:5000/whisper with a file in it. Body should be form-data.
You can use the following curl command to test the API:

curl -F "file=@/path/to/file" http://localhost:5000/whisper

In result you should get a JSON object with the transcript in it.

How to deploy the API?

This API can be deployed anywhere where Docker can be used. Just keep in mind that this setup currently using CPU for processing the audio files. If you want to use GPU you need to change Dockerfile and share the GPU. I won't go into this deeper as this is an introduction. Docker GPU

And why not give spin to what you;ve learned during our upcoming AI Hackathons?

You can find the whole code here

Thank you for reading! If you enjoyed this tutorial you can find more and continue reading our AI tutorials.