AI Voice Assistant

Created by team Podsearch on March 04, 2023

Whisper ASR is a speech recognition model that can convert speech input to text. TTS (Text-to-Speech) is a technology that can convert text to speech. GPT 3.5 turbo is a language model that can generate human-like text. By using these models together, it is possible to create an AI voice assistant that can understand spoken input, generate a response in natural language, and then convert that response to speech. Here's how the process would work: The user speaks into a microphone, and the speech is captured as an audio file. The audio file is passed to the Whisper ASR model, which converts the speech to text. The text is then passed to the GPT 3.5 turbo language model, which generates a response. The response generated by GPT is then passed to the TTS model, which converts the text to speech. The resulting speech is played back to the user through the speakers or headphones. Overall, this process allows for a natural, conversational interaction between the user and the AI assistant. The user can speak to the AI in the same way they would speak to another person, and the AI can respond in a way that is both accurate and human-like.

Category tags:

"The background noise make it difficult to comprehend what the project was about, the business value could have more extension."


Gonzalo Huelmo Romero

Bachelor in Informatics / Data Science

"Well done for a small group"


Ervin Moore

PhD Computer Science Student

"Demo could not run. Idea is great and clear. There are similar apps out there like this, functionality is not orgininal, so need to add a specific use case based on data and additional training to create a unique product. "


Pawel Czech