Solution Overview With the development of advanced AI models, developers, and data scientists face the challenge of efficiently evaluating and comparing multiple models for some particular tasks.LlamaEval addresses this challenge by offering a streamlined, easy-to-use evaluation dashboard for comparing Llama model outputs. By integrating the Together AI API, users can select and test multiple models. The results are displayed on an interactive dashboard with two key features: a benchmark description expander and a performance scoreboard featuring metrics. So the user has information about the benchmark used and the final evaluation scores. Tech Stack Backend: Python, Together AI, requests, pandas, nltk, scikit-learn and Hugging Face datasets Frontend: Streamlit for creating the user interface, displaying results, and providing interaction Deployment: Docker, Azure Cloud Services: Container Registry (store the docker container) and Container App (deploy and provide a url link that can be copy-pasted on the web.) Target Audience This tool is designed for data scientists, AI researchers, developers and, machine learning Engineers from enterprises, academia, and government sectors. They need efficient solutions for quick model assessment in real-world applications roughly estimating a serviceable market of ~$10B. Unique Features/Benefits • Simplicity and speed: LlamaEval offers a simple interface to quickly assess multiple models without the need for complex setups or long runtimes. • Comprehensive insights: Real-time results and detailed comparison panels. • Customizable: In the future, the users will be able to select any number of models for comparison and evaluate them on any dataset, making it versatile for a wide range of use cases.
Category tags:Team member not visible
This profile isn't complete, so fewer people can see it.
Amina Asif
Full Stack Developer
Paraskevi Kivroglou
Software Developer