A tool for language learning. Conversation mode: 1. Give basic roleplay scenario's 2. Evaluate conversation 3. Proper grammar/word usage Practice mode: 1. Read sentences 2. See your pronunciation mistakes 3. Play the audio of both ElevenLabs and your audio to compare the difference It uses a local proxy server with: - ElevenLabs for realistic TTS - OpenAI for LLM completions and transcriptions - For the pronunciation, I used Montreal forced alignment to get transcription intervals. It generates aligned phones with the transcription. The Montreal Forced Aligner (MFA) is a tool used in speech processing and linguistics to align speech recordings with their corresponding transcriptions. It takes a speech recording and a corresponding text transcript as input and automatically aligns the words in the transcript with their corresponding segments in the audio. 1. Phones are generated (using MFA) for both the user recorded message and the ElevenLabs TTS. 2. Damerau-levenshtein distance is computed between the words and the phones of each word to get the difference in pronunciation. 3. The shortest-edit path is interpreted as replacing, inserting, deleting or transposing a word/phone. i.e. Do you have mispronunciation patterns like stressing your T's. This is done by comparing the generated phones to voices by ElevenLabs. You can learn different accents or languages by changing the voice/language of the ElevenLabs voice.
Category tags: