Hackathon Submission: Enhanced Multimodal AI Performance Project Title: Optimizing Multimodal AI for Real-World Applications Overview: Our project focused on optimizing multimodal AI performance using the TruEra Machine Learning Ops platform. We evaluated 18 models across vision, audio, and text domains, employing innovative prompting strategies, performance metrics, and sequential configurations. Methodology: Prompting Strategies: Tailored prompts to maximize model response accuracy. Performance Metrics: Assessed models on accuracy, speed, and error rate. Sequential Configurations: Tested various model combinations for task-specific effectiveness. Key Models Evaluated: Vision: GPT4V, LLava-1.5, Qwen-VL, Clip (Google/Vertex), Fuyu-8B. Audio: Seamless 1.0 & 2.0, Qwen Audio, Whisper2 & Whisper3, Seamless on device, GoogleAUDIOMODEL. Text: StableMed, MistralMed, Qwen On Device, GPT, Mistral Endpoint, Intel Neural Chat, BERT (Google/Vertex). Results: Top Performers: Qwen-VL in vision, Seamless 2.0 in audio, and MistralMed in text. Insights: Balance between performance and cost is crucial. Some models like GPT and Intel Neural Chat underperformed or were cost-prohibitive. Future Directions: Focus on fine-tuning models like BERT using Vertex. Develop more connectors for TruLens for diverse endpoints. Submission Contents: GitHub Repository: [Link] Demo: [Link] Presentation: [Link] Our submission showcases the potential of multimodal AI evaluation using TruEra / TruLens in enhancing real-world application performance, marking a step forward in human-centered AI solutions.
Category tags: