Streaming-MoE Stack for AMD
Streaming MoE on AMD: run mixture-of-experts models bigger than your GPU's VRAM. A kernel PR'd to ROCm/AITER, an MIT-licensed llama.cpp fork, and a stock OpenAI-compatible agent — running unchanged from a $700 RX 9070 XT to a $15K MI300X.