.png&w=256&q=75)
1
1
Looking for experience!

The problem: existing AI DJ tools either loop royalty-free templates or crossfade tracks naively with no musical intelligence. Neither produces a real DJ mix — one with arc, narrative, and section-aware transitions. AiJockey is a 5-agent pipeline: 1. Director — Qwen2-Audio-7B-Instruct (multimodal) listens to the first 30s of each user clip, identifies clusters, picks a set narrative ("warmup ambient → build to tech-house peak → cooldown lofi"), and emits per-junction tier (minor/major/drop) + intent (build_tension/drop_payoff/genre_jump). Not retrieval-augmented directly multimodal. 2. Planner — beam-search subset selector with section-aware candidate scoring (energy curve match + transition-type fit + vocal/key/bpm penalties + duration). Reads All-In-One Music Structure Analyzer's labeled segments instead of MFCC heuristics. 3. Executor — 33 implemented transition primitives across 14 categories (crossfade, eq_swap, filter_fade, drum_break, mashup, stem_swap, echo_out, spectral_hold, harmonic_overlay, ...). Per-stem Demucs separation enables vocal-aware overlay logic. 4. Probes — three deterministic audio analyses run after every render: spectral phasing detector, vocal-bleed cross-correlation, RMS-envelope energy mismatch. Each junction gets a severity score. 5. Improver — when probes flag a junction, a rule-based engine edits the timeline (energy-repick, overlap-shorten, technique-swap) and re-renders the affected segments only. Production discipline: vocal-safety rule auto-rejects 18 aggressive techniques on vocal-active sections. DJ-industry 8% stretch cap prevents 30% vocal slowdown. Phrase-aware boundaries ensure no mid-verse cuts. Stack: ROCm 7.0, PyTorch 2.9 nightly, Qwen2-Audio-7B, Demucs htdemucs_ft, laion CLAP, Beat-This!. Cost: ~$50 in AMD Developer Cloud credits to build end-to-end. Open-source, deployed on Hugging Face Spaces.
10 May 2026