Yoruba Image Synthesis - Multi-Modal Fusion

Created by team HackAttack on May 15, 2024

In this project, we confront the linguistic barriers faced by Yoruba speakers due to limited language resources. Image generation models primarily excel with English prompts, posing a challenge for non-English speakers. To address this, we embarked on a dual-track approach: data collection and model development. Firstly, recognizing the scarcity of Yoruba datasets, particularly in image generation prompts, we meticulously curated our own dataset. English sentences were carefully selected to serve as image generation prompts and then translated into Yoruba using a dictionary-based approach. Next, we developed a custom translator model trained specifically to translate Yoruba into English. This intermediary step ensures seamless integration with image generation models, allowing for smoother operation and accurate results. Through rigorous testing, we achieved an impressive 85% accuracy on the test set, affirming the efficacy of our approach. The core strength of our project lies in its ability to empower users to generate images in their native language without encountering language barriers. By collecting our own data and training custom models, we circumvent the limitations imposed by the scarcity of Yoruba resources. Leveraging the SDXL API for image generation further enhances the user experience, ensuring high-quality outputs. Looking ahead, we envision extending our efforts to include additional languages such as Fon and Dendi, expanding our dataset and catering to a broader audience. Furthermore, our ultimate goal is to develop a model capable of directly generating images from Yoruba, Fon, and Dendi without the need for translation into English. In summary, our project not only addresses a pressing need within the Yoruba-speaking community but also lays the groundwork for future advancements in multi-lingual image generation. Through our innovative approach, we pave the way for inclusive, barrier-free communication and creative expression.

Category tags:

"great work. generating images is very simple idea that could have great future if implemented in the right business field. good luck"

avatar

Walaa Nasr Elghitany

Data scientist and doctor

"Great project guys, addresses an important challenge. I would love to see a better use of the multimodal model, but its still pretty good"

avatar

Shebagi Mitra

Technical Mentor