ROCmPort AI

Created by team Agentic Migrator on May 05, 2026
AI Agents & Agentic Workflows (Best Track for Beginners)Hugging FaceQwen

CUDA-to-ROCm migration fails not at translation but at correctness. hipify-clang renames APIs mechanically. It cannot detect that a reduction kernel assuming warpSize=32 will silently produce wrong results on AMD wavefront-64. Lanes 32 through 63 are skipped. The code compiles. The output is wrong. Nobody tells you. ROCmPort AI is a closed-loop agentic system built to catch exactly this class of bug before it reaches production. The pipeline runs five agents in sequence. The Analyzer performs a static scan before any LLM call, grounding the system in what is actually in the code. The Translator runs hipify-clang as a first pass then applies LLM corrections for architecture-specific issues the tool cannot handle. The Optimizer applies MI300X-specific changes: wavefront-64 alignment, LDS bank conflict padding, shared memory tiling. The Tester compiles with hipcc and profiles with rocprof on real AMD hardware. The Coordinator evaluates the profiler output and decides whether to iterate or finalize. All four demo kernels were compiled and profiled on a real AMD Instinct MI300X on AMD Developer Cloud. gfx942. ROCm 7.2. data_source: "mi300x_live" The primary model is Qwen2.5-Coder-32B-Instruct, purpose-built for code reasoning tasks. Groq LLaMA-3.3-70B handles log parsing as a cost-efficient fallback. In production, Qwen runs via vLLM on the MI300X instance itself. Failure cases are documented explicitly, including library-heavy CUDA using CUB, Thrust, or cuDNN, which requires manual review after ROCmPort output. This is intentional. Credibility requires honesty about scope. The value is not speed. It is correctness before execution, and a decision trace that a senior engineer can audit.

Category tags: