.png&w=256&q=75)
1
1
Looking for experience!
.png&w=828&q=75)
The Chaos Economy is a multi-agent reinforcement learning simulation of a single-stock market where six AI agents — four traders, a market maker, and an SEC regulator — are trained from scratch using GRPO on LoRA-adapted Llama-3.2 models running on AMD MI300X GPUs. Over 250 training steps, a full financial crisis arc emerges organically: agents discover momentum trading, adapt to spread widening, begin colluding on coordinated buy/sell patterns, and face regulatory crackdown — all without scripted behavior. Every action is a live LLM inference; every reward is grounded in real market mechanics including GBM price dynamics, order-flow impact, news shocks, and a 10-vector reward-hacking audit. The headline result: our 3B Llama-3.2 model, fine-tuned with GRPO on AMD hardware, outperforms NVIDIA's Nemotron Super 120B as a trading agent — higher mean PnL, better format compliance, more coherent strategy — at 40× smaller parameter count. Targeted domain fine-tuning beats scale. Tech stack: - Training: TRL + GRPO, PEFT LoRA adapters - Hardware & Inference: AMD MI300X GPUs via Hugging Face's AMD hardware tier; optimum-amd for ROCm-optimized execution; vLLM for high-throughput batched inference across all six agents per step - Platform: Hugging Face Spaces for deployment and model hosting - Simulation: custom multi-agent environment with order matching, GBM market sim, news marketplace, inter-agent messaging, and manipulation detection - Evaluation: W&B tracking — PnL, diversity, format compliance, oversight intensity, news alpha - Replay viewer: embedded frontend for full 250-step episode visualization GRPO fine-tuning on multi-agent trajectories produces emergent collusion, front-running, and regulatory adaptation — rivaling Nemotron 120B at a fraction of the compute, on open AMD hardware.
10 May 2026