ASR adaption for low resource languages with LLMs

Created by team AILabs Founder Solo on November 10, 2024

[​Task 3. Expanding Low-Resource Languages] Today, even the best production-level real-time speech Recognition systems use old n-gram language models on top of the Speech Recognition models (CTC ASR). CTC means that the ASR model outputs character probabilities (logits), not real words. Once logits are received, an n-gram language model is applied on top of it to maximize the accuracy. The problem with n-gram model is that it's able to take into account only n words (usually n=3) in the sequence, which is a huge limitation. Normally, In order to properly understand the speech, we need: 1) the whole context of the conversation, 2) prior context (something we talked about before) In this project, I address the first problem by training LLM like Llama3.2 on top of the ASR generated logits.

Category tags:
  • Profile image of null

    Team member not visible

    This profile isn't complete