WarehouseEye is an operational intelligence platform for warehouse CCTV, built to replace slow and expensive manual video review. Instead of watching hours of footage frame by frame, the system converts video into structured, queryable data that operations and safety teams can use immediately. The pipeline is designed as an end-to-end workflow: (1) frame extraction, (2) person detection with RT-DETRv2, (3) persistent multi-object tracking with ByteTrack to maintain identity continuity, (4) crop-level semantic enrichment using Qwen3-VL to classify activities, zones, and anomaly signals, and (5) SQLite-backed persistence exposed through a FastAPI backend and a Streamlit frontend for timeline exploration and natural-language querying. WarehouseEye also supports optional person Re-ID with pluggable backends (Qwen3-VL embeddings or OSNet) to recover identities after occlusions and temporary tracking loss. This improves continuity in real-world warehouse environments where visibility changes frequently. From a systems perspective, the project targets AMD MI300X deployment so heavy multimodal workloads can run on a single node with simplified operations. This architecture reduces orchestration complexity while keeping throughput practical for production-like scenarios. In internal repository benchmarks, WarehouseEye shows low per-video processing cost and estimated savings versus GPT-4V-style baselines, while preserving track-level traceability and visual evidence. The result is a practical open-source foundation for industrial video analytics: faster incident investigation, better operational visibility, and an extensible stack for safety, compliance, and process optimization use cases.
Category tags: