Deep Chaos Scheduler with kernel optimization

Created by team juiceb0xc0de on May 08, 2026
Fine-Tuning on AMD GPUs (Advanced / GPU-Intensive)QwenHugging Face

The Deep Chaos Scheduler picks which layers train during full fine-tuning instead of training all of them. The hypothesis: full fine-tuning isn't always necessary for a model to learn things like math — the parameters that need to update don't have to be contiguous, and randomly-located subsets can match or beat the dense baseline.Every 50 training steps — what I call a sticky block — the scheduler designates 30 to 70% of the victim layers as active for that block. Active victims are then narrowed further at random into one of four modes: full (both attention and MLP), attention only, MLP only, or skipped entirely. Within active attention heads and MLP channels, 30 to 70% are kept; within hidden dimensions, 60 to 95% are kept. The unselected 50 steps' worth of parameters and activations don't just get masked — a custom kernel optimization called the layer hoist physically yanks dead and identity layers out of model.layers before the forward pass, and drops a tiny frozen residual stub in their place so the surviving layers don't suddenly receive the previous surviving layer's output verbatim. The forward graph shrinks for the whole block, training runs 2.25x faster wall-clock and uses 18% less VRAM on a 5-epoch Qwen2.5-3B run on MI300X, and the math gradients flow through a different random sub-network every 50 steps.Benchmarked against a true full fine-tune on the same data (simplescaling/s1K), same compute budget, three independent random seeds, using LM Evaluation Harness on GSM8K, Minerva Math, MATH-500, and MGSM. At 3B, DeepChaos beat the FFT-3B baseline by +17.2pp on MGSM, and DeepChaos-3B outperformed FFT-7B on MATH-500 despite being half the size. At 7B, every chaos seed beat the dense baseline across every Minerva subcategory, with gains up to +13.2pp on prealgebra and +8.6pp on overall Minerva math_verify. Three-seed consistency rules out lucky-mask explanations.

Category tags: