Join us for a hackathon where we will be using OpenAI Whisper to create innovative solutions! Whisper is a neural net that has been trained to approach human level robustness and accuracy for English speech recognition. We will be using this tool to create applications that can transcribe in multiple languages, as well as translate from those languages into English. This will be a great opportunity for you to learn more about speech processing and to create some useful applications!
Our AI hackathon brought together a diverse group of participants, who collaborated to develop a variety of impressive projects based on:
1695
Participants
106
Teams
20
AI Applications
Product Name InvestogAId Problem With rising cost of living and soaring inflation across the world, cash deposits are increasingly becoming worthless. Inflation is eating away at everyone's wealth and there is a need for people to invest their money in something that will grow in value. However, the stock market is a very volatile place and it is difficult for people to make informed decisions about where to invest their money. Solution Using OpenAI Whisper and GPT-3, we are creating an automated transcription tool that will watch your favourite video about a stock trading strategy and implement it for you. This will allow you to make informed decisions about where to invest your money.
Blue
According to research and statistics Hate speech has become a real issue in online communication, especially in online games and live-streaming platforms where users are shielded by their anonymity. This phenomenon discourages a lot of people from using those platforms. With this project our goal is to help already existing voice communication platforms combat hate speech, harassment and toxic behaviour. Our solution to this problem is to utilise each user's microphone in order to assess whether his speech is obscene, toxic, threatful, insulting etc. using cutting-edge Machine Learning tools like Whisper and text-classification models. Our target audience is Video-Game companies, live-streaming platforms and Social Media. We really think that our product can help them minimise hate speech in their communities and thus achieve higher Quality of service.
Biscoff
Butter is an AI-based integrated chatbot that utilizes specialized speech-to-text conversion to accurately output messages from live voice recordings for individuals that stutter, and answer personal questions regarding stuttering. Butter implements the state-of-the-art Whisper API created by LabLab AI to intuitively translate speech into written form and omit any unintended interruptions in their flow of speech. Our goal is to empower and improve accessibility to communication to users with speech impediments.
Boss
This event has now ended, but you can still register for upcoming events on lablab.ai. We look forward to seeing you at the next one!
Checkout Upcoming Events →Submissions from the teams participating in the OpenAI Whisper Hackathon event and making it to the end 👊
Hey everyone, this is a video of our OpenAI hackathon demo. This project consists of the whisper, gpt-3, and codex APIs. The goal of the project was to to transcribe audio using whisper, then return that text as a python script, and lastly, use codex to to translate that python script into another programming language.
The Prompt Engineers
Whether you're a student, a programmer, or someone who simply needs to make a summary or piece of code, Summy can help you! 1. Select a mode: text or code 2. Start recording 3. Stop recording 4. You will get a response depending on the mode you selected: - text: A summarization of the recording - code: A code snippet based on the recording It can help you in: - meetings - documentation - study notes - coding tool
Neurons
People with hearing disabilities do not have the same autonomy as others. They are not able to interact to the extent of those around them, and have limited freedom. WordSense is a hardware product that assists people with hearing disabilities in navigating daily life with tactile sensory feedback, more specifically, Haptic Touch. As a person with hearing disabilities, WordSense solves the problems of not being able to passively interpret conversations around you, having to face the person to read lip movement or sign language, not being able to multitask, and having tunnel vision due to the lack of sound as an indicator. WordSense eases the daily lives of people facing hearing disabilities, and provides them with the power of autonomy.
WordSense
What i built 1 - YouTube-Sum This tool basically give you the short summary of any YouTube video in any language so that you do not waste time to watch whole video just get the summary and get knowledge from the video in matter of minutes. summary is so awesome and easy to understand. 2- TrendSum This tool basically give you the very short summary of top trending news on any topic you searched in a search box like hacking, football match, machine learning, politics, etc summary give you the info of all news on that topic. We provide personalized content in such a way that our user read the facts, information or knowledge according to their interest and also grab that knowledge in minutes using ml models and personalized recommendation integrated in the android application.
Navis
Translation is necessary for spreading new information, knowledge, and ideas across the world. It is necessary to achieve effective communication between different cultures. In the process of spreading new information, translation is something that can change history. So, we have used our expertise as computer engineers with different specialties to encourage more global communication amongst those of several cultural backgrounds using the pyttsx3, whisper, torch, os, streamlit, NumPy, Sounddevice, Scipy.io.wavfile and Wavioas libraries to build our AI model and to handle all the requirements for needed for our project. Also, we have used IoT applications like raspberry pi to act as our main handler for the project that receives the voice from the user, enters it to be processed, and then revile the translated voice through the speaker.
The Chasers
Product Name InvestogAId Problem With rising cost of living and soaring inflation across the world, cash deposits are increasingly becoming worthless. Inflation is eating away at everyone's wealth and there is a need for people to invest their money in something that will grow in value. However, the stock market is a very volatile place and it is difficult for people to make informed decisions about where to invest their money. Solution Using OpenAI Whisper and GPT-3, we are creating an automated transcription tool that will watch your favourite video about a stock trading strategy and implement it for you. This will allow you to make informed decisions about where to invest your money.
Blue
ChAI is a food voice assistant. ChAI receives an audio file with a description of what someone would like to eat and then uses Whisper, GPT-3 and a food API to create recommendations. These recommendations are divided in two categories. In the first category, the user receives a list of recipes that adjust to their input. While the second category outputs a list of dishes from restaurants that fit their likings. To achieve this goal. The front end is a web application made with nodejs, css, javascript and html in which we record an audio telling what we would like to eat. We then use javascript to make a call to the whisper API to obtain the transcript. This transcript is then passed to the back end, which is a flask server with python, via an HTTP request. The request sends the transcript to the natural processing server, which parses the text with GPT-3 and asks a series of important questions according to items of interest associated with food. Finally, we use the answers provided by GPT-3 to call a food API that outputs recipes, dishes and restaurants that are related to the input queries.
Uniandes
Our project consists of a solution for videoconferencing platforms to threats that threaten the proper development of a communicative environment by using AI Whisper as the main feature of the bot. We started with a modest Discord bot, but we consider that this idea can scale and expand to many other horizons.
Sentient cookies
Butter is an AI-based integrated chatbot that utilizes specialized speech-to-text conversion to accurately output messages from live voice recordings for individuals that stutter, and answer personal questions regarding stuttering. Butter implements the state-of-the-art Whisper API created by LabLab AI to intuitively translate speech into written form and omit any unintended interruptions in their flow of speech. Our goal is to empower and improve accessibility to communication to users with speech impediments.
Boss
Utilizing OpenAI's Whisper model and a CNN-Based Speech Emotion Recognition (SER) model to determine whether to call the authorities based on sentiment.
Spaghetti
According to research made by J. Birulés-Muntané1 and S. Soto-Faraco (10.1371/journal.pone.0158409), watching movies with subtitles can help us learn a new language more effectively. However, the traditional way of showing subtitles in YouTube or Netflix does not provide us the best way to check the meaning of new vocabulary nor understand complex slang and abbreviation. Therefore, we found out that if we display dual subtitles (the original subtitle of the video and the translated one), the learning curve immediately improves. In research conducted in Japan, the authors concluded that the participants who viewed the episode with dual subtitles did significantly better (http://callej.org/journal/22-3/Dizon-Thanyawatpokin2021.pdf). After understanding both the problem and the solution, we decided to create a platform for learning new languages with dual active transcripts. When you enter a YouTube URL or upload an MP4 file in our web application, the app will produce a web page where you can view the video and have a transcript running next to it in two different languages. We have accomplished this goal and successfully integrated OpenAI Whisper, GPT and Facebook's language model for the backend of the app. At first, we use Streamlit for the app, but it does not provide a transcript that automatically move with the audio timeline, also Streamlit does not give us the ability to design the user interface, so we create our own full stack application using Bootstrap, Flask, HTML, CSS and Javascript. Our business model is subscription-based and/or one-time purchase based on the usage. Our app isn’t just for language learners. It can also be used for writers, singers, YouTubers, or anyone who would like to make their content reach out to more people by adding different languages to their videos/audios. Due to the limitation of free hosting plan, we could not deploy the app on cloud for now but we have a simple website that you can have a quick look at what we are creating (https://phoenixwhisper.onrender.com/success/BzKtI9OfEpk/en).
Phoenix
Luminous Decibels, give a picture to your words. An easy way to generate a video for what you want to say. A simple way that would allows someone who just knows how to fill online forms, create an interesting video.
Akatsuki
We have created a Discord bot with Python that is able to listen to users in a voice call, and when prompted by a command, it records the user's audio, transcribes the audio using OpenAI's Whisper, generate a response using GPT-3, generate text to speech using the Uberduck API, and then finally send an the response audio back into the Discord voice call. While we think there is room to improve our implementation of the project, we think that it has quite a few uses, from voice call moderation, to accessibility and more. We plan on continuing to develop the project to a more polished state, where it can be reliably used in other discord servers.
The Picard Trio
According to research and statistics Hate speech has become a real issue in online communication, especially in online games and live-streaming platforms where users are shielded by their anonymity. This phenomenon discourages a lot of people from using those platforms. With this project our goal is to help already existing voice communication platforms combat hate speech, harassment and toxic behaviour. Our solution to this problem is to utilise each user's microphone in order to assess whether his speech is obscene, toxic, threatful, insulting etc. using cutting-edge Machine Learning tools like Whisper and text-classification models. Our target audience is Video-Game companies, live-streaming platforms and Social Media. We really think that our product can help them minimise hate speech in their communities and thus achieve higher Quality of service.
Biscoff
Our project uses Open Ai Whisper and GPT-3 services, Flask and React. Flask is used for the API part, React for the front end . The user will start recording with his microphone an question which will be transcribed and answered by the GPT3 module. We think with further developments this bot can reach a real product level with high capacity of resolving user needs.
team team
The RememberThis app takes in an audio recording or voice note. The voice note is transcribed into text. A keyword is extracted from the text to categorise it. The keyword and text are uploaded to a Google Sheet.
Whisper4lokal
YouTube has a vast and high quality educational content. But most of it is in English. This is a disadvantage to non-English speakers. BabelTube plans to democratize learning by enabling non-English speakers to generate subtitles for any video on demand and on the fly. It integrates directly with YouTube web player using Chrome Extensions, and uses the same interface used by YouTube to display its subtitles. So the user experience of this app is also on par with that of YouTube's own subtitle display.
Autobot
HearO is an app built to help people who experience some degree of hearing loss. HearO uses audio to generate ASL (American Sign Language) through various orders. Our crucial component of the idea is Open AI Whisper API.
TATAR
Our project “Taleeq” is a mobile phone application for children aged 6 to 9. This app is concerned with helping children to express their needs and feelings properly and fluently at the right time with the help of speech recognition technology, the application will convert the child’s speech to text and compare it with the words set. which makes it easy for children to deal with people in different situations. All of this is done in the form of an interesting game that has multiple levels where the child needs to collect points to open a new level.
Taleeq
Voice messages are becoming a more and more common way to communicate, it offers people something faster than typing and sometimes you can’t talk in real time, so a call isn’t an option. But it also has downsides, many times you are in a crowdy place and are not able to listen to voice messages, but what if you will miss something important? Don’t worry, we got your back. During this hackathon, we developed a bot for a popular messenger Telegram that uses Whisper by Open AI to transcribe voice messages. You can just forward a voice message from a sender to a bot, and you will get textual transcriptions in seconds. And it also works for as many languages as Whisper support. We hope that such a simple tool can help more people to be comfortable communicating.
UDL