Footer navigation

Unlocking state-of-the-art artificial intelligence and building with the world's talent

LinkedIn
Twitter/X
Instagram
Discord
YouTube
Twitch

Other group brands:

Links

AI Tech
AI Hackathons
AI Tutorials
AI Applications
NativelyAI
AI Articles
Leaderboard
Writers

lablab

About
Brand
Hackathon Guidelines
Terms of Use
Code of Conduct
Privacy Policy

Get in touch

Discord
Sponsor
Cooperation
Contribute
[email protected]

© 2026 NativelyAI Inc. All rights reserved.

3.36.4

Help CenterBrowse FAQs and ask our AI.

Discord CommunityChat with mentors and the team.

Claim FREE $100

creditsAI Hackathons AI Apps AI Tech AI Tutorials AI Articles NativelyAI Sponsor

AI app: Ghost-Coder: Autonomous CUDA-to-HIP Agent for AMD Developer Hackathon

Home
App Discovery
Ghost-Coder: Autonomous CUDA-to-HIP Agent

Ghost-Coder: Autonomous CUDA-to-HIP Agent

Created by team AMD Ghost Coder- CUDA-to-ROCm AI Migration on May 07, 2026

AMD Developer Cloud AMD ROCm HuggingFace Hub HuggingFace Spaces Qwen3-Coder

Hugging FaceFine-Tuning on AMD GPUs (Advanced / GPU-Intensive)QwenVision & Multimodal AI

The migration of high-performance computing workloads from NVIDIA's CUDA to AMD's ROCm ecosystem is a critical bottleneck for developers adopting new hardware. Ghost-Coder eliminates this friction by serving as an autonomous, self-healing translation agent. Powered by the Qwen2.5-Coder-32B model running on an AMD Instinct™ MI300X (192GB) instance, Ghost-Coder doesn't just do static text replacement—it deeply analyzes device-side kernels and host-side memory API calls to generate native, highly optimized HIP code. Architecture & Workflow: The system is built with a decoupled frontend/backend architecture to maximize GPU resource allocation. The UI is a reactive Gradio web application hosted via Hugging Face Spaces, designed with an agentic interface that visualizes the translation and verification steps in real-time. This frontend communicates via a secure tunneling protocol to a custom FastAPI bridge running directly on the AMD Developer Cloud droplet. Inside the droplet, the AI engine operates under strict hardware guardrails. To achieve stable, high-throughput inference on the MI300X architecture, we engineered a specialized execution environment. By overriding the ISA to gfx942, strictly serializing kernel execution, and disabling experimental Triton memory paths, we stabilized the Qwen model to prevent memory aperture violations during continuous generative decoding. Self-Healing & Translation: When a user submits a CUDA kernel (e.g., a tiled matrix multiplication), Ghost-Coder analyzes the computational logic. It dynamically maps NVIDIA-specific host calls like cudaMalloc and cudaMemcpy to their hipMalloc and hipMemcpy equivalents, while restructuring execution macros (like transforming <<<blocks, threads>>> into hipLaunchKernelGGL). The result is drop-in ready C++ code tailored for the AMD stack. Ghost-Coder accelerates the transition to open-source GPU computing by turning days of manual kernel porting into a seamless, automated seconds-long process.

Category tags:

Code Generation, Content, Developer Tools, Knowledge Base, Language Learning

Github Presentation Demo

Explore more applications

Agentic Microcontroller Tester

Agentic Microcontroller Tester Agentic Microcontroller Tester

Projector_Searcher

AMD ROCmAntigravityGemma 2GemmaGenerative Agents

Neural Assembly Pattern Learning

Pattern Recognition by Neural Assemblies is a user-interfaced Streamlit application which shows how neural assemblies learn and classify visual patterns via biologically-motivated rules of plasticity.

Fastcoders

AI/ML APIAMD ROCm

OmniCaption AI

explorer

CineScribe

CineScribe is an advanced video intelligence agent that combines multi-modal visual timeline analysis and audio transcripts to generate highly accurate, style-conditioned video captions across varied contexts.

HarHarMahadev

Event-Driven AI Video Captioning using MiniMax M3

An event-driven AI video captioning system that combines CLIP semantic analysis, optical flow, and MiniMax M3 to detect meaningful events, reduce unnecessary VLM calls by over 90%, and generate coherent captions in four distinct styles.

Autonomous AMD

Muhammad Talha
Senior Software Engineer

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

Agentic Microcontroller Tester

Agentic Microcontroller Tester Agentic Microcontroller Tester

Projector_Searcher

AMD ROCmAntigravityGemma 2GemmaGenerative Agents

Neural Assembly Pattern Learning

Pattern Recognition by Neural Assemblies is a user-interfaced Streamlit application which shows how neural assemblies learn and classify visual patterns via biologically-motivated rules of plasticity.

Fastcoders

AI/ML APIAMD ROCm

OmniCaption AI

explorer

CineScribe

CineScribe is an advanced video intelligence agent that combines multi-modal visual timeline analysis and audio transcripts to generate highly accurate, style-conditioned video captions across varied contexts.

HarHarMahadev

Event-Driven AI Video Captioning using MiniMax M3

An event-driven AI video captioning system that combines CLIP semantic analysis, optical flow, and MiniMax M3 to detect meaningful events, reduce unnecessary VLM calls by over 90%, and generate coherent captions in four distinct styles.

Autonomous AMD