ROCmForge — The AMD Performance Compiler
A fine-tuned 7B LLM that converts PyTorch, CUDA, and Triton code into hand-tuned HIP kernels optimized for AMD MI300X — with wavefront-64, MFMA intrinsics, and gfx942-specific optimizations the base model never produces.