Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

YOLO v8

Ultralytics YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds upon the success of previous YOLO versions and introduces new features and improvements to further boost performance and flexibility. YOLOv8 is designed to be fast, accurate, and easy to use, making it an excellent choice for a wide range of object detection and tracking, instance segmentation, image classification and pose estimation tasks.

What's new in YOLOv8?

YOLOv8 supports a full range of vision AI tasks, including detection, segmentation, pose estimation, tracking, and classification. This versatility allows users to leverage YOLOv8's capabilities across diverse applications and domains.

General
Relese dateMay, 2023
Repositoryhttps://github.com/ultralytics/ultralytics
TypeReal time object detection

Libraries

Discover YOLOv8

YOLO YOLOv8 AI technology Hackathon projects

Discover innovative solutions crafted with YOLO YOLOv8 AI technology, developed by our community members during our engaging hackathons.

TempoGraph: Local Multimodal Video Analysis

TempoGraph: Local Multimodal Video Analysis

TempoGraph is a fully-local, privacy-preserving multimodal video analysis system that turns raw video files into rich structured outputs — entities, behaviors, transcripts, timelines, and interactive knowledge graphs — without sending a single frame to the cloud. Stage 1 — Frame Selection: Motion-aware sampling with static, moving, and auto camera modes. For moving cameras it estimates homography to separate object motion from camera movement, then identifies keyframes where motion peaks exceed a configurable sigma threshold. Stage 1.5 — Audio Transcription: Whisper.cpp running on Vulkan transcribes the full audio track to millisecond-accurate segments. Stage 2 — YOLO Detection: YOLO26 runs on 2nd GPU over every sampled frame, outputting normalized bounding boxes, class names, track IDs, and confidence scores. Stage 3 — Depth Estimation: Depth Anything V2 via HuggingFace Transformers adds per-detection mean depth to every bounding box, giving 3D spatial context to 2D detections. Stage 4 — Frame Scoring: Picks which frames the VLM actually sees. In keyframes mode, only motion-peak frames are forwarded. In scored mode, FrameScorer ranks all YOLO-scanned frames using a weighted combination of motion delta, new YOLO class appearances, tracked object churn, and IoU drop between frames — then fills the VLM budget with the highest-signal frames. Keyframes are always pinned in first regardless of mode. Stage 5 — VLM Captioning: Qwen3.5-VL-9B served by a custom llama.cpp build compiled for AMD ROCm/HIP, running on an AMD RX 9070 XT with a 100k-token context window. Frames are chunked and sent to the model alongside YOLO-derived annotations. Each chunk's summary seeds the next prompt for narrative continuity across the video. Stage 6 — Aggregation: A final text-only LLM call synthesizes all per-chunk captions and the audio transcript into a structured JSON with entities, visual events, audio events, and multimodal correlations linking what was said to what was seen.

AssemblyMind

AssemblyMind

Assembly Mind is a real-time multimodal agent for hardware assembly auditing. In electronics prototyping, simple assembly errors, reversed polarity, incorrect connections, component mismatches, are a leading cause of board failure. These errors are typically caught only after power-on, resulting in destroyed components and hours of debugging. Existing Automated Optical Inspection systems are designed for high-volume manufacturing floors and require dedicated fixtures, pre-programmed rules, and substantial capital investment. They do not serve the engineer working on a one-off prototype at a workbench. Assembly Mind addresses this gap by combining schematic document understanding with live visual perception. The system ingests schematic PDFs or netlists and processes live camera feeds from a standard webcam. A vision-language model, running on AMD Instinct MI300X GPUs via ROCm, performs semantic reasoning across the schematic and physical assembly to detect discrepancies before power is applied. The agent outputs structured audit results and natural-language guidance, flagging errors such as incorrect component orientation or mismatched pin connections. For this hackathon, we demonstrate an end-to-end pipeline: schematic ingestion, real-time camera analysis, multimodal reasoning, and immediate feedback. Built with Qwen2.5-VL, LangChain for agentic orchestration, and Hugging Face Optimum-AMD for ROCm optimization, the system utilizes the MI300X's high memory bandwidth and large HBM3 capacity for efficient high-resolution image processing and long-context multimodal inference. The result is a practical tool that reduces rework time and prevents costly prototyping failures.