Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

LLaVA: Large Language and Vision Assistant

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.

General
Relese dateNovember 20, 2023
Repositoryhttps://github.com/haotian-liu/LLaVA
TypeMultimodal Language and Vision Model

What is LLaVA?

Visual Instruction Tuning: LLaVA, short for Large Language-and-Vision Assistant, represents a significant leap in multimodal AI models.

With a focus on visual instruction tuning, LLaVA has been engineered to rival the capabilities of GPT-4V, demonstrating its exceptional prowess in understanding both language and vision. This state-of-the-art model excels in tasks ranging from impressive chatbot interactions to setting a new standard in science question-answering accuracy, achieving a remarkable 92.53%. With LLaVA's innovative approach to instruction-following data and the effective combination of vision and language models, it promises a versatile solution for diverse applications, marking a significant milestone in the field of multimodal AI.

LLaVA Tutorials


LLaVA Libraries

A curated list of libraries and technologies to help you build great projects with 'technology'.


LLavA AI technology page Hackathon projects

Discover innovative solutions crafted with LLavA AI technology page, developed by our community members during our engaging hackathons.

Sentinel Generalist:

Sentinel Generalist:

Sentinel Generalist is a zero-shot agricultural intelligence system powered by a large Vision-Language Model (Qwen2.5-VL-7B-Instruct) running on an AMD MI300X GPU. Upload a single photograph of any plant β€” indoors, outdoors, healthy, or dying β€” and Sentinel performs 11 simultaneous agricultural analyses in real time, streaming its step-by-step reasoning trace like an expert agronomist thinking out loud. Reasoning trace β€” watch the AI examine shadows, soil texture, leaf color, and turgor in real time Species identification with confidence scores (50,000+ species, zero-shot) Geospatial inference β€” USDA hardiness zone, latitude, climate class deduced from shadows and soil Light assessment β€” estimated daily sun hours + adequacy check Watering analysis β€” visual turgor + soil moisture cues Nutrient deficiency β€” N-P-K + micronutrients with organic remedy doses Pest & disease β€” IPM-first diagnosis (organic β†’ chemical β†’ prevention) Companion planting β€” recommended neighbors + antagonists + placement instructions Harvest intelligence β€” days to harvest, visual cues, succession planting Seasonal planning β€” frost risk, crop rotation, next-season prep Beginner garden planner β€” "What should I plant right now?" with 7 curated plants for your zone and season Garden harvest preview β€” interactive top-down layout + AI-generated image of your garden at peak harvest Why AMD MI300X? The 192GB HBM3 VRAM loads the full 7B model with room to spare. ROCm 7.0 + gfx942 optimization delivers ~23% faster inference than comparable cloud APIs. Your plant photos never leave the Droplet β€” privacy-first by design. The demo differentiator: Tech stack: Qwen2.5-VL-7B-Instruct Β· AMD MI300X Β· ROCm 7.0 Β· HuggingFace Optimum-AMD Β· FastAPI Β· React 19 Β· GitHub Pages Β· NDJSON streaming Β· 100/100 Lighthouse Β· zero trackers Β· no CDNs

AgriSync

AgriSync

AgriSync is an agricultural intelligence platform for smallholder farmers across Africa's 54 nations. Two problems destroy farm income every season: 1. Crop disease goes undiagnosed β€” the nearest agronomist is hours away or unaffordable. A farmer watches their maize develop grey leaf spot and doesn't know whether to spray, wait, or replant. 2. Farmers sell at the wrong market β€” without real-time price data, a farmer in Nakuru may sell tomatoes locally for KES 20/kg when Nairobi is paying KES 31/kg that same morning. On 500 kg that's KES 5,500 lost to information asymmetry. AgriSync solves both in one flow. A farmer takes a photo of their crop leaf with any smartphone. Our two-stage vision pipeline β€” Llama-3.2-11B-Vision-Instruct as primary, with LLaVA-v1.5-7B (fine-tuned on plant diseases) as specialist fallback, both running on AMD MI300X β€” identifies the disease, severity level, and the specific PCPB-approved chemical available at the nearest agro-vet, with the price in KES. At the same time, our ArbitrageEngine agent queries crop prices across African market hubs, calculates net profit after transport costs, and recommends the single best market to sell at today. The OrchestratorAgent (Mistral-7B-Instruct) combines both outputs into a bilingual English and Swahili advisory. For farmers without a smartphone, the same advisory is delivered as a 160-character SMS via Africa's Talking API. The platform covers diseases across major African food crops: Tomato Late Blight, Maize Gray Leaf Spot, Cassava Mosaic Disease, Fall Armyworm, Groundnut Rosette, Rice Yellow Mottle Virus, Bean Angular Leaf Spot, Potato Late Blight, and more.

NavSight AI: Navigation for the Visually Impaired

NavSight AI: Navigation for the Visually Impaired

2.2 billion people worldwide live with vision impairment, and 43 million are completely blind. Guide dogs' cost $25,000–$50,000 and take months to train. White canes detect obstacles within just 1 meter. Existing apps can label objects, but cannot understand complex scenes, judge distances, or warn you about a car turning toward you. NavSight AI is a real-time navigation assistant built on AMD Instinct MI300X with ROCm. Four specialized agents work in concert: Vision Agent (YOLOv8 + Depth Anything V2, ~25ms/frame); Scene Agent (LLaVA-NeXT 7B via vLLM, ~290ms contextual understanding); Navigation Agent (priority matrix, CRITICAL to LOW hazard ranking); and Communication Agent (spoken guidance, CRITICAL alerts interrupt mid-sentence). Core innovation: a dual inference path -- fast (every frame, ~25ms) and smart (every 3rd frame, ~290ms) -- both on one MI300X GPU using only ~15 GB of 192 GB HBM3. We solved a real ROCm gap: torchvision::nms is absent in ROCm builds, so we wrote fully GPU-native NMS in pure PyTorch -- zero CPU round-trips. NavSight also tracks object motion frame-to-frame (approaching/receding/crossing) and maintains surface state across frames so stair warnings persist throughout a descent. Validated: 4/4 hazard detection, 0 critical false negatives across 4 real-world scenarios. NavSight targets a $7.6B assistive tech market growing at 14% annually. Unlike Glidance ($999 hardware) or Be My Eyes (human volunteers), NavSight needs only a camera -- scalable at $5–10/month. Roadmap: mobile app on AMD Ryzen AI, GPS turn-by-turn navigation, multi-language TTS, community hazard mapping. Every person deserves to walk safely -- NavSight AI and AMD MI300X make it possible.

AthleteX

AthleteX

AthleteX is a lightweight, AI-driven sports performance evaluation webapp designed to help athletes understand their strengths, identify improvement areas, and receive quick insights based on their performance data. The platform features an intuitive, minimal, and mobile-friendly interface that allows users to input athletic metrics with ease. Once data is entered, AthleteX uses client-side JavaScript processing to instantly analyse performance, eliminating the need for complex backend infrastructure. A core feature of AthleteX is its AI-powered scoring engine, which evaluates user input across multiple dimensions such as endurance, speed, agility, and consistency. The app then generates an overall performance score, allowing athletes to benchmark their abilities or track improvements over time. The system also highlights specific strengths and weaknesses, offering clear guidance on where additional training may be needed. The webapp is built as a single-page interface, ensuring smooth transitions, fast loading, and seamless user experience without page reloads. Its lightweight design makes it highly responsive across devices, including smartphones, tablets, and desktops. AthleteX is deployed via Netlify, providing secure hosting, instant global content delivery, and excellent performance reliability. Overall, AthleteX combines simplicity, speed, and AI-assisted insights to democratize access to athletic evaluation tools. Whether for personal use, coaching purposes, or early-stage talent assessment, the platform delivers an accessible and efficient way to analyse performance without requiring specialized equipment or software.