OpenELM AI technology page Top Builders
Explore the top contributors showcasing the highest number of OpenELM AI technology page app submissions within our community.
OpenELM
OpenELM (Open-source Efficient Language Models) is a family of Transformer-based language models developed by Apple, optimized for running on devices with constrained memory and computational resources. The OpenELM models are designed to balance high performance with efficiency, making them suitable for deployment on mobile devices, laptops, and other hardware with limited processing power.
General | |
---|---|
Relese date | 2024 |
Author | Apple |
Type | Transformer-based Language Models |
Key Models and Features
-
OpenELM-270M: A compact model with 270 million parameters, designed for basic text generation and understanding tasks.
-
OpenELM-450M: An intermediate model with 450 million parameters, offering improved performance for more complex language tasks.
-
OpenELM-1.1B: A larger model with 1.1 billion parameters, providing a good balance between size and capability.
-
OpenELM-3B: The most powerful in the series, with 3 billion parameters, suitable for more demanding applications.
Each model is available in a base version and an instruction-tuned variant, which is fine-tuned on datasets for tasks that require following specific instructions.
Unique Architecture and Efficiency
OpenELM models feature a unique non-uniform layer-wise scaling architecture. Unlike traditional Transformers, which maintain consistent parameter allocation across layers, OpenELM allocates fewer parameters to initial layers and gradually increases them towards the output layers. This design optimizes the use of available parameters, enhancing the modelโs performance without increasing its size.
Training and Data
The models are trained on a mix of publicly available datasets, including The Pile and RedPajama, totaling approximately 1.8 trillion tokens. Instruction tuning was performed using the UltraFeedback dataset, comprising around 60,000 prompts. The models were trained with a focus on efficiency, employing techniques like Flash Attention and grouped query attention to reduce memory and computational requirements.
Applications and Use Cases
OpenELM models are ideal for on-device applications where privacy and low latency are crucial. They are suitable for a range of applications, including natural language understanding, text generation, and coding assistance. The models are fully open-source, with Apple providing comprehensive training logs, multiple checkpoints, and pre-training configurations to facilitate further research and development .
Open Source Release
In a significant departure from its usual approach, Apple has made the OpenELM models fully open-source, including the model weights, training data, and code. This move aims to encourage collaboration within the research community and to support the development of on-device AI applications.
๐ For more details and to access the models, you can visit the OpenELM collection on Hugging Face.