Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

YOLO

YOLO (You Only Look Once) is a state-of-the-art, real-time object detection algorithm that can quickly detect and locate objects within an image or video. The YOLO architecture works by taking an input and separating it into a grid of cells and each of these cells is in charge of detecting objects within that region. YOLO returns the bounding boxes containing all the objects in the image and predicts the probability of an object being in each of the boxes and also predicts a class probability to help identify the type of object it is. YOLO is a highly effective object detection algorithm and making YOLO and open-source project led the community to make several improvements in such a limited time.

General
Relese date2015
AuthorJoseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi
Paper(https://arxiv.org/abs/1506.02640)
TypeObject detection algorithm

YOLO - Resources

Learn even more about YOLO!

  • v7 Labs Blog "YOLO: Algorithm for Object Detection Explained".
  • YOLOv5 Repository Object detection architectures and models pretrained on the COCO dataset.
  • YOLOv6 Web demo Gradio demo for YOLOv6 for object detection on videos.
  • Hugging Face Spaces Test YOLOv7 in the browser with Hugging Face Spaces.

YOLO AI Technologies Hackathon projects

Discover innovative solutions crafted with YOLO AI Technologies, developed by our community members during our engaging hackathons.

SixthSense: Haptic Vision for the Blind

SixthSense: Haptic Vision for the Blind

SixthSense is a wearable that helps blind and low-vision people sense obstacles around them and find a clear path. A phone is mounted on the chest and watches the way ahead. On-device models turn what the camera sees into a simple readout: how near obstacles are in the left, center, and right zones, what objects are present, and whether the path is clear. That readout drives a vibration belt worn at the waist, which buzzes on the side of the nearest obstacle so the user can feel which way to move. The point is that knowing something is close is not enough. A basic vibrating cane buzzes whenever anything is near, so in a crowd it buzzes constantly without telling you where the gap is. SixthSense reads each zone separately and steers the user toward open space, so it stays useful in busy areas. The user can also ask what is ahead and hear a short spoken answer, or point the camera at a sign and have its text read aloud. The vision runs on the phone. YOLOv11n detects objects and tags each to a left, center, or right zone. Depth-Anything-V2 estimates how near things are, which sets how hard the belt buzzes. Qwen2.5-0.5B answers spoken questions about the scene, and ML Kit reads text on demand. The models run through ExecuTorch as compiled files on the phone, offline, on a Qualcomm Snapdragon 8 Elite, with the option to run on the Hexagon NPU. The phone sends a small directional packet over Bluetooth to an ESP32, which drives the belt motors. Cost is the main reason it exists. Smart canes and glasses run from about $850 to over $2,000, and only one in ten people who need assistive technology can get it, dropping to about five percent in lower-income countries. SixthSense uses a phone the user already has and a sub-$20 belt, with room to reach about $50 at scale, putting this within reach of people who are priced out today.

Lumina — Embodied Spatial AI

Lumina — Embodied Spatial AI

Lumina is a real-time assistive navigation system designed for visually impaired users. It uses a camera feed (local webcam or IP camera) to continuously perceive the environment, build a persistent spatial memory, and respond to natural-language queries like "Where is my phone?" or "Find my bottle" with spoken, clock-direction navigation instructions. The system is built around a true Multi-Agent System (MAS) architecture — six autonomous agents communicate exclusively through a central Pub/Sub event bus with no direct inter-agent coupling. This enables genuine agent autonomy, fault isolation, and emergent negotiation behaviour. Core capabilities: - Real-time object detection and multi-object tracking (YOLOv8 + IoU tracker) - Monocular depth estimation with RANSAC multi-anchor metric calibration (MiDaS) - 3D spatial back-projection (X, Y, Z camera-coordinate vectors) - Persistent spatial memory with probabilistic confidence decay (Qdrant vector DB) - Illumination-invariant visual Re-ID for cross-frame object deduplication - Bird's-Eye View occupancy grid for safe lateral obstacle avoidance - ORB-SLAM visual odometry compass (drift-free heading without IMU) - LLM-driven query parsing and natural-language response generation - LLM cascade: Groq → OpenAI → Local edge SLM (llama.cpp / Ollama) → deterministic fallback - Real-time WebSocket streaming of annotated frames, agent logs, and navigation responses - Cross-session persistent user memory

Weld AI - Autonomous Multi-Agent Radiography

Weld AI - Autonomous Multi-Agent Radiography

📌 Overview In regulated industries like oil & gas, piping, power, and manufacturing, weld structural failures can lead to catastrophic accidents. Quality assurance relies on Non-Destructive Testing (NDT) radiography, a manual, slow, and error-prone process. The Weld NDT AI Inspector transforms this workflow into a highly reliable, autonomous, and auditable multi-agent pipeline. The system is architected using Hexagonal (Ports & Adapters) design principles to keep core business rules independent of ML frameworks and databases, running securely on Google Cloud Platform (GCP) and MongoDB Atlas. 👥 The Multi-Agent Collaboration Room Inside a distributed Band.ai room, four remote agents work in sequence to ensure safe and compliant decision-making: Weld Orchestrator Agent: Directs the room flow, contrast-enhances the raw scan with CLAHE, dispatches payloads, and logs tamper-evident audit events. Weld Vision Agent: Deploys a fine-tuned computer vision model to locate and classify physical defects (e.g. porosity, slag, cracks) on grey-scale film. Weld Compliance Agent: Leverages a rules engine and Gemini to cross-reference dimensions against regulatory standard tolerances (ASME B31.3, API 1104, AWS D1.1). Weld Review Agent (Track 3 Core): A strict safety auditor that enforces mandatory overrides (e.g., any crack = mandatory reject, regardless of size) and routes ambiguous cases to an ESCALATE status for human Level III inspection. 💾 Enterprise-Grade Resiliency Dual-Database Adapter: Saves inspection records to MongoDB Atlas with automatic fallback to local SQLite storage to prevent downtime in offline industrial fields. Inference Caching: Indexes scans via cryptographic SHA-256 hashes, returning cached defect coordinates on duplicates to prevent redundant GPU overhead. Vertex AI Integration: Authenticates securely using GCP service account IAM policies, bypassing the need for static API keys.

YOLO