Rhymes AI’s flagship models, Aria and Allegro, are at the forefront of multimodal AI innovation, each designed to tackle unique challenges in processing diverse data types.
Aria is the flagship Mixture-of-Experts (MoE) model developed by Rhymes AI, specifically designed to handle multimodal inputs like text, images, and video. This open-source model focuses on efficiency and high performance. During inference, Aria activates only 3.9 billion parameters from its total 25.3 billion parameters, making it one of the fastest multimodal AI systems available today. Aria processes diverse data formats seamlessly, leveraging its 64K-long multimodal context window to deliver comprehensive insights. This capability allows it to handle long-form content, such as captioning 256-frame videos in just 10 seconds, with remarkable speed and precision.
Allegro, Rhymes AI’s text-to-video model, introduces new capabilities for creative industries, enabling users to transform text into high-quality videos quickly and efficiently. Allegro is optimized for video generation tasks, with a model size of 3B parameters, and can process short video clips at 720p resolution in a matter of minutes. Its optimized architecture allows for rapid video production, opening up new possibilities for content creators, marketers, and AI researchers alike.
👉 Read more about Rhymes AI's Aria & Allegro models