Tachiwin Indigenous Languages Translator

medal
Created by team Tachiwin on November 22, 2024

Tachiwin is an initiative aimed at building and developing Artificial Intelligence tools for Mexico’s indigenous and endangered languages. The goal is to provide their speakers and communities with tools available in other dominant languages, thereby balancing opportunities for development, access, and the use of technology. Additionally, it seeks to promote and revalue these languages and the knowledge of indigenous peoples who, until now, have been left behind in cutting-edge information technologies, all while maintaining a socially and ethically grounded vision. Developments Tutunakú-Spanish-English Translation Large Language Model We are among the first initiatives globally to achieve a language model for an indigenous language novel to the base model. In this phase, we have developed a pretrained and fine-tuned Llama 3.1 8b model, which, to the best of our knowledge, represents the first LLM implemented for a Mexican indigenous language. The model was trained using a custom instruction dataset containing translations between Tutunakú, Spanish, and English. We are currently preparing additional datasets to include Nahuatl, which will be released soon. This model now powers the new beta version of the Tachiwin Android App. The app has evolved from being a simple bilingual dictionary to a fully functional offline/online translator, enabling communities with limited internet access to use the tool offline or toggle between modes as needed.

Category tags: