OpenAI Whisper tutorial: Updating our Whisper API with GPT-3

Thursday, October 13, 2022 by Flafi

Mastering Whisper: OpenAI's Speech Recognition Powerhouse

Unveiling Whisper, OpenAI's trailblazing speech recognition system. Trained on an extensive multilingual dataset, it excels in deciphering accents, reducing background noise, and understanding technical language. It's your key to unlocking powerful applications in speech recognition. Dive into our Whisper tutorial and harness the power of GPT-3, transforming the way you interact with language and sound.

Whisper API Mastery: Tame the Text-Generating Giant, GPT-3

Dive into GPT-3, OpenAI's colossal Language Model, as you journey through our enlightening Whisper tutorial. Witness the incredible text generation and comprehension of this brilliant API, empowering you to craft outstanding AI applications and elevate your projects to unprecedented heights.

"Embarking on the Whisper API Journey: A Step-Up Tutorial"

Ready to elevate your Whisper API skills? This tutorial is a step-up from our previous Whisper API with Flask and Docker guide. If you're already familiar with that, let's dive deeper into the world of Whisper apps and GPT-3 applications!

OpenAI API key

If you don't have it already please go to OpenAI and create an account. And create your API key. Never share your API key in public repository!

Updates to requirement.txt

We are adding openai package to our file

Creating file for gpt3 function

We will create a new file called gpt3.py and add the following code to it. In the prompt I was using the summary option to summarize the text, but you can use anything you want. And you can tweak the parameters as well.

import openai

def gpt3complete(speech):
    # Completion function call engine: text-davinci-002

    Platformresponse = openai.Completion.create(
        engine="text-davinci-002",
        prompt="Write a short summary of this text: {}".format(speech),
        temperature=0.7,
        max_tokens=1500,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
        )

    return Platformresponse.choices[0].text

Update app.py

On the top we will update our imports. Instead of "MY_API_KEY" please insert the API key you created earlier.

import openai
import gpt3
# GPT-3 API Key
openai.api_key = "MY_API_KEY"

Update the /whisper route

We will integrate our new GPT3 function into the route. So when we are getting the result from whisper we will pass it to the gpt3 function and return the result.

@app.route('/whisper', methods=['POST'])
def handler():
    if not request.files:
        # If the user didn't submit any files, return a 400 (Bad Request) error.
        abort(400)

    # For each file, let's store the results in a list of dictionaries.
    results = []

    # Loop over every file that the user submitted.
    for filename, handle in request.files.items():
        # Create a temporary file.
        # The location of the temporary file is available in `temp.name`.
        temp = NamedTemporaryFile()
        # Write the user's uploaded file to the temporary file.
        # The file will get deleted when it drops out of scope.
        handle.save(temp)
        # Let's get the transcript of the temporary file.
        result = model.transcribe(temp.name)
        text = result['text']
        # Let's get the summary of the soundfile
        summary = gpt3.gpt3complete(text)
        # Now we can store the result object for this file.
        results.append({
            'filename': filename,
            'transcript': text.strip(),
            'summary': summary.strip(),
        })

    # This will be automatically converted to JSON.
    return {'results': results}

How to run the container?

Open a terminal and navigate to the folder where you created the files.
Run the following command to build the container:

docker build -t whisper-api .

When built is ready, run the following command to run the container:

docker run -p 5000:5000 whisper-api

How to test the API?

You can test the API by sending a POST request to the route http://localhost:5000/whisper with a file in it. Body should be form-data.
You can use the following curl command to test the API:

curl -F "file=@/path/to/file" http://localhost:5000/whisper

In result you should get a JSON object with the transcript and summary in it.

How to deploy the API?

This API can be deployed anywhere where Docker can be used. Just keep in mind that this setup currently using CPU for processing the audio files. If you want to use GPU you need to change Dockerfile and share the GPU. I won't go into this deeper as this is an introduction. Docker GPU

You can find the whole code on github

"Put Your Whisper and GPT-3 Skills into Action: Join the AI Revolution"

You've mastered the Whisper API and GPT-3, now it's time to put those skills to the test! Join lablab.ai's AI hackathons and collaborate with a community of over 52,000 AI enthusiasts. Together, we can create AI solutions that will make a real difference in the world!

Thank you for reading! If you enjoyed this tutorial you can find more AI tutorials