Date loading...

OpenAI Whisper Hackathon

The Future of Speech Recognition

Join us for a hackathon where we will be using OpenAI Whisper to create innovative solutions! Whisper is a neural net that has been trained to approach human level robustness and accuracy for English speech recognition. We will be using this tool to create applications that can transcribe in multiple languages, as well as translate from those languages into English. This will be a great opportunity for you to learn more about speech processing and to create some useful applications!
OpenAI Whisper Hackathon event thumbnail

Hackathon summary video

OpenAI Whisper

Machine learning and artificial intelligence still face challenges when it comes to speech recognition. With OpenAI Whisper, we are one step closer to resolving this problem.

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

This makes it possible to approach human level robustness and accuracy in English speech recognition. It is robust to accents, background noise, and technical language.

It also enables transcription in multiple languages, as well as translation from those languages into English. The Whisper architecture is a simple end-to-end approach, implemented as an encoder-decoder Transformer.

The Challenge

The challenge for this hackathon is to create an innovative solution with OpenAI Whisper.

Plus points if you incorporate other bonus technologies like GPT-3 and Codex . With Whisper's high accuracy and ease of use, we want to see what applications you can create using voice interfaces

Whisper Hackathon details

🗓️ Where and when

The hackathon starts on October 14th and ends on October 16th. Over the weekend, you'll have the opportunity to learn from lablab experts during workshops, keynotes, and mentoring sessions. The hackathon will take place on the lablab.ai platform.

🦸🏼‍♂️ Who should participate?

Previous experience in AI is not required but welcomed. While many participants are industry experts, we also welcome people with other types of domain knowledge that want to understand & explore how AI can be used in their fields.

🛠️ How to participate in the hackathon

The hackathon will take place online on lablab.ai platform and lablab.ai Discord Server. Please register for both in order to participate. To participate click the "Enroll" button at the bottom of the page and read our Hackathon Guidelines.

🔗 Useful links for the hackathon

For general Hackathon information see our Hackathon Guidelines. To learn more about Whisper, see the Whisper documentation. To learn more about GPT-3, see the GPT-3 documentation. To learn more about Codex, see the Codex documentation.

Speakers, Mentors and Organizers

  • Mathias Asberg
    Mentor

    Mathias Asberg

    Founder New Native

  • Paweł Czech
    Speaker

    Paweł Czech

    Founder New Native

  • Alexander Molak
    Speaker

    Alexander Molak

    ML Researcher (R&D) at IRONSCALES

  • Rudradeb Mitra
    Speaker

    Rudradeb Mitra

    Founder Omdena

  • Anastasiia Strakhova
    Organizer

    Anastasiia Strakhova

    Community Manager at New Native

  • Olesia Zinchenko
    Organizer

    Olesia Zinchenko

    Social Media Manager at New Native

Hackathon FAQ

Who can join the Hackathon?

We welcome domain experts from all industries, not just AI or tech. Successful AI solutions require a combination of technical expertise and domain knowledge. Coding experience is recommended.

Do I need a team?

You are welcome to join as a team or solo. If solo, we still encourage you to look for a team before the event, but ultimately it is your choice. We recommend you to join the Deep Learning Labs Discord channel: https://discord.gg/gCuBwBB35k and posting in the #looking-for-team channel to get to know your potential future team members.

Do I need a Github account?

It is recommended. At least one team member should have a Github account. You can create one for free if you don't already have one.

I have other questions.

Feel free to reach us on social media, or through our Discord channel.

Event Schedule

  • To be announced

Winner Submissions 🏆

InvestogAId

InvestogAId

Product Name InvestogAId Problem With rising cost of living and soaring inflation across the world, cash deposits are increasingly becoming worthless. Inflation is eating away at everyone's wealth and there is a need for people to invest their money in something that will grow in value. However, the stock market is a very volatile place and it is difficult for people to make informed decisions about where to invest their money. Solution Using OpenAI Whisper and GPT-3, we are creating an automated transcription tool that will watch your favourite video about a stock trading strategy and implement it for you. This will allow you to make informed decisions about where to invest your money.

Blue

Moriarty

Moriarty

According to research and statistics Hate speech has become a real issue in online communication, especially in online games and live-streaming platforms where users are shielded by their anonymity. This phenomenon discourages a lot of people from using those platforms. With this project our goal is to help already existing voice communication platforms combat hate speech, harassment and toxic behaviour. Our solution to this problem is to utilise each user's microphone in order to assess whether his speech is obscene, toxic, threatful, insulting etc. using cutting-edge Machine Learning tools like Whisper and text-classification models. Our target audience is Video-Game companies, live-streaming platforms and Social Media. We really think that our product can help them minimise hate speech in their communities and thus achieve higher Quality of service.

Biscoff

Butter

Butter

Butter is an AI-based integrated chatbot that utilizes specialized speech-to-text conversion to accurately output messages from live voice recordings for individuals that stutter, and answer personal questions regarding stuttering. Butter implements the state-of-the-art Whisper API created by LabLab AI to intuitively translate speech into written form and omit any unintended interruptions in their flow of speech. Our goal is to empower and improve accessibility to communication to users with speech impediments.

Boss

Submitted concepts, prototypes and pitches

Submissions from the teams participating in the OpenAI Whisper Hackathon event and making it to the end 👊

InvestogAId

InvestogAId

Product Name InvestogAId Problem With rising cost of living and soaring inflation across the world, cash deposits are increasingly becoming worthless. Inflation is eating away at everyone's wealth and there is a need for people to invest their money in something that will grow in value. However, the stock market is a very volatile place and it is difficult for people to make informed decisions about where to invest their money. Solution Using OpenAI Whisper and GPT-3, we are creating an automated transcription tool that will watch your favourite video about a stock trading strategy and implement it for you. This will allow you to make informed decisions about where to invest your money.

Blue

Moriarty

Moriarty

According to research and statistics Hate speech has become a real issue in online communication, especially in online games and live-streaming platforms where users are shielded by their anonymity. This phenomenon discourages a lot of people from using those platforms. With this project our goal is to help already existing voice communication platforms combat hate speech, harassment and toxic behaviour. Our solution to this problem is to utilise each user's microphone in order to assess whether his speech is obscene, toxic, threatful, insulting etc. using cutting-edge Machine Learning tools like Whisper and text-classification models. Our target audience is Video-Game companies, live-streaming platforms and Social Media. We really think that our product can help them minimise hate speech in their communities and thus achieve higher Quality of service.

Biscoff

Butter

Butter

Butter is an AI-based integrated chatbot that utilizes specialized speech-to-text conversion to accurately output messages from live voice recordings for individuals that stutter, and answer personal questions regarding stuttering. Butter implements the state-of-the-art Whisper API created by LabLab AI to intuitively translate speech into written form and omit any unintended interruptions in their flow of speech. Our goal is to empower and improve accessibility to communication to users with speech impediments.

Boss

Discord Voice Chat Bot

Discord Voice Chat Bot

We have created a Discord bot with Python that is able to listen to users in a voice call, and when prompted by a command, it records the user's audio, transcribes the audio using OpenAI's Whisper, generate a response using GPT-3, generate text to speech using the Uberduck API, and then finally send an the response audio back into the Discord voice call. While we think there is room to improve our implementation of the project, we think that it has quite a few uses, from voice call moderation, to accessibility and more. We plan on continuing to develop the project to a more polished state, where it can be reliably used in other discord servers.

The Picard Trio

Phoenix Whisper

Phoenix Whisper

According to research made by J. Birulés-Muntané1 and S. Soto-Faraco (10.1371/journal.pone.0158409), watching movies with subtitles can help us learn a new language more effectively. However, the traditional way of showing subtitles in YouTube or Netflix does not provide us the best way to check the meaning of new vocabulary nor understand complex slang and abbreviation. Therefore, we found out that if we display dual subtitles (the original subtitle of the video and the translated one), the learning curve immediately improves. In research conducted in Japan, the authors concluded that the participants who viewed the episode with dual subtitles did significantly better (http://callej.org/journal/22-3/Dizon-Thanyawatpokin2021.pdf). After understanding both the problem and the solution, we decided to create a platform for learning new languages with dual active transcripts. When you enter a YouTube URL or upload an MP4 file in our web application, the app will produce a web page where you can view the video and have a transcript running next to it in two different languages. We have accomplished this goal and successfully integrated OpenAI Whisper, GPT and Facebook's language model for the backend of the app. At first, we use Streamlit for the app, but it does not provide a transcript that automatically move with the audio timeline, also Streamlit does not give us the ability to design the user interface, so we create our own full stack application using Bootstrap, Flask, HTML, CSS and Javascript. Our business model is subscription-based and/or one-time purchase based on the usage. Our app isn’t just for language learners. It can also be used for writers, singers, YouTubers, or anyone who would like to make their content reach out to more people by adding different languages to their videos/audios. Due to the limitation of free hosting plan, we could not deploy the app on cloud for now but we have a simple website that you can have a quick look at what we are creating (https://phoenixwhisper.onrender.com/success/BzKtI9OfEpk/en).

Phoenix

Guardians of Discord

Guardians of Discord

Our project consists of a solution for videoconferencing platforms to threats that threaten the proper development of a communicative environment by using AI Whisper as the main feature of the bot. We started with a modest Discord bot, but we consider that this idea can scale and expand to many other horizons.

Sentient cookies

SafeWord

SafeWord

Utilizing OpenAI's Whisper model and a CNN-Based Speech Emotion Recognition (SER) model to determine whether to call the authorities based on sentiment.

Spaghetti

BabelTube

BabelTube

YouTube has a vast and high quality educational content. But most of it is in English. This is a disadvantage to non-English speakers. BabelTube plans to democratize learning by enabling non-English speakers to generate subtitles for any video on demand and on the fly. It integrates directly with YouTube web player using Chrome Extensions, and uses the same interface used by YouTube to display its subtitles. So the user experience of this app is also on par with that of YouTube's own subtitle display.

Autobot

Web Bot

Web Bot

Our project uses Open Ai Whisper and GPT-3 services, Flask and React. Flask is used for the API part, React for the front end . The user will start recording with his microphone an question which will be transcribed and answered by the GPT3 module. We think with further developments this bot can reach a real product level with high capacity of resolving user needs.

team team

RememberThis

RememberThis

The RememberThis app takes in an audio recording or voice note. The voice note is transcribed into text. A keyword is extracted from the text to categorise it. The keyword and text are uploaded to a Google Sheet.

Whisper4lokal

Taleeq application

Taleeq application

Our project “Taleeq” is a mobile phone application for children aged 6 to 9. This app is concerned with helping children to express their needs and feelings properly and fluently at the right time with the help of speech recognition technology, the application will convert the child’s speech to text and compare it with the words set. which makes it easy for children to deal with people in different situations. All of this is done in the form of an interesting game that has multiple levels where the child needs to collect points to open a new level.

Taleeq

voiceObot

voiceObot

Voice messages are becoming a more and more common way to communicate, it offers people something faster than typing and sometimes you can’t talk in real time, so a call isn’t an option. But it also has downsides, many times you are in a crowdy place and are not able to listen to voice messages, but what if you will miss something important? Don’t worry, we got your back. During this hackathon, we developed a bot for a popular messenger Telegram that uses Whisper by Open AI to transcribe voice messages. You can just forward a voice message from a sender to a bot, and you will get textual transcriptions in seconds. And it also works for as many languages as Whisper support. We hope that such a simple tool can help more people to be comfortable communicating.

UDL

ChAI Food voice assistant

ChAI Food voice assistant

ChAI is a food voice assistant. ChAI receives an audio file with a description of what someone would like to eat and then uses Whisper, GPT-3 and a food API to create recommendations. These recommendations are divided in two categories. In the first category, the user receives a list of recipes that adjust to their input. While the second category outputs a list of dishes from restaurants that fit their likings. To achieve this goal. The front end is a web application made with nodejs, css, javascript and html in which we record an audio telling what we would like to eat. We then use javascript to make a call to the whisper API to obtain the transcript. This transcript is then passed to the back end, which is a flask server with python, via an HTTP request. The request sends the transcript to the natural processing server, which parses the text with GPT-3 and asks a series of important questions according to items of interest associated with food. Finally, we use the answers provided by GPT-3 to call a food API that outputs recipes, dishes and restaurants that are related to the input queries.

Uniandes

Chase The Language

Chase The Language

Translation is necessary for spreading new information, knowledge, and ideas across the world. It is necessary to achieve effective communication between different cultures. In the process of spreading new information, translation is something that can change history. So, we have used our expertise as computer engineers with different specialties to encourage more global communication amongst those of several cultural backgrounds using the pyttsx3, whisper, torch, os, streamlit, NumPy, Sounddevice, Scipy.io.wavfile and Wavioas libraries to build our AI model and to handle all the requirements for needed for our project. Also, we have used IoT applications like raspberry pi to act as our main handler for the project that receives the voice from the user, enters it to be processed, and then revile the translated voice through the speaker.

The Chasers

Navis

Navis

What i built 1 - YouTube-Sum This tool basically give you the short summary of any YouTube video in any language so that you do not waste time to watch whole video just get the summary and get knowledge from the video in matter of minutes. summary is so awesome and easy to understand. 2- TrendSum This tool basically give you the very short summary of top trending news on any topic you searched in a search box like hacking, football match, machine learning, politics, etc summary give you the info of all news on that topic. We provide personalized content in such a way that our user read the facts, information or knowledge according to their interest and also grab that knowledge in minutes using ml models and personalized recommendation integrated in the android application.

Navis

WordSense

WordSense

People with hearing disabilities do not have the same autonomy as others. They are not able to interact to the extent of those around them, and have limited freedom. WordSense is a hardware product that assists people with hearing disabilities in navigating daily life with tactile sensory feedback, more specifically, Haptic Touch. As a person with hearing disabilities, WordSense solves the problems of not being able to passively interpret conversations around you, having to face the person to read lip movement or sign language, not being able to multitask, and having tunnel vision due to the lack of sound as an indicator. WordSense eases the daily lives of people facing hearing disabilities, and provides them with the power of autonomy.

WordSense

Code Translation Demo

Code Translation Demo

Hey everyone, this is a video of our OpenAI hackathon demo. This project consists of the whisper, gpt-3, and codex APIs. The goal of the project was to to transcribe audio using whisper, then return that text as a python script, and lastly, use codex to to translate that python script into another programming language.

The Prompt Engineers

Luminous Decibels

Luminous Decibels

Luminous Decibels, give a picture to your words. An easy way to generate a video for what you want to say. A simple way that would allows someone who just knows how to fill online forms, create an interesting video.

Akatsuki

Summy Your AI Co Worker

Summy Your AI Co Worker

Whether you're a student, a programmer, or someone who simply needs to make a summary or piece of code, Summy can help you! 1. Select a mode: text or code 2. Start recording 3. Stop recording 4. You will get a response depending on the mode you selected: - text: A summarization of the recording - code: A code snippet based on the recording It can help you in: - meetings - documentation - study notes - coding tool

Neurons

HearO app

HearO app

HearO is an app built to help people who experience some degree of hearing loss. HearO uses audio to generate ASL (American Sign Language) through various orders. Our crucial component of the idea is Open AI Whisper API.

TATAR

Teams: OpenAI Whisper Hackathon

Check out the rooster and find teams to join at OpenAI Whisper Hackathon

WhisperGPT3CodexWinners Announced

Enrolled

1695 /