Combo - A Multi-Model AI Chat

Created by team MeoWML on May 08, 2026

HuggingFace Hub Qwen3 Llama 3 AMD ROCm

Hugging FaceFine-Tuning on AMD GPUs (Advanced / GPU-Intensive)

Combo is a locally-executed, multi-model AI ensemble system that runs three heterogeneous large language models. Rather than relying on a single model's perspective, Combo queries all three models independently on the same prompt, then synthesizes their individual outputs into one unified, high-confidence answer. This is inspired by ensemble learning in classical machine learning, now applied to generative AI inference. Everything runs offline on consumer AMD hardware with no cloud dependency, no API calls, and no data leaving the machine. Individual large language models, regardless of their scale, carry inherent biases, knowledge gaps, and reasoning blind spots shaped by their training data and fine-tuning objectives. A single model can confidently produce an answer that is subtly wrong, incomplete, or one-sided. Combo challenges this assumption by demonstrating that sequential multi-model ensemble inference is practical on a consumer AMD GPU within tight VRAM constraints, and that the synthesized output is qualitatively richer than any single model's response alone. The Gradio UI maintains a full session history using Gradio's gr.State. On each new turn, the entire prior conversation is prepended to the prompt sent to all three models, allowing the ensemble to reference, connect and build upon earlier context throughout the session not just respond to isolated prompts. Instead of treating models outputs equally in synthesis, each answer could be scored for internal consistency, factual grounding and relevance, then weighted accordingly in the combo summary, giving higher authority to more confident or more detailed responses. More model can be included and giving user choice of models. Attaching a local vector database to the pipeline would allow models to retrieve relevant document chunks before answering, grounding the ensemble's responses in user-supplied knowledge bases and reducing hallucination.

Category tags:

Github Presentation Demo

Explore more applications

SOAP Copilot: AI Clinical Scribe on AMD

SOAP Copilot turns raw doctor-patient conversations into structured SOAP notes, ICD-10 codes, and patient-friendly summaries in seconds, using a 3-agent Llama 3.3 70B pipeline built and fine-tuned on AMD hardware.

LoneSoloWolf

AMD Developer CloudAMD ROCmLLaMAHuggingFace Hub

SmartBandit Router

nyan

Thymus

Thymus is a lightweight hybrid token-efficient router designed to maximize accuracy while minimizing token costs in multi‑task LLM pipelines. It dynamically routes user queries across local and remote models on LLM providers.

The Disappointer

HuggingFace HubLLaMAAMD Developer Cloud

CrosslaneAI

An agentic pipeline that takes any CUDA repo and autonomously ports it to ROCm/HIP — analyze → transpile → compile → test → benchmark on real AMD GPUs — with an LLM repair loop fixing everything the mechanical tools can't.

Soloengenier

AMD Developer CloudAMD ROCmHuggingFace SpacesVercelAWS

AI Classroom Edge Intelligence

A privacy-first classroom AI platform that routes sensitive work to local edge systems and eligible anonymized analysis to Fireworks AI, helping teachers make faster, safer instructional decisions even with unreliable internet.

AI Classroom Edge

AMD Developer CloudQwen3rest apiGithub CopilotCodexChatGPT

Piyal Datta

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

SOAP Copilot: AI Clinical Scribe on AMD

SOAP Copilot turns raw doctor-patient conversations into structured SOAP notes, ICD-10 codes, and patient-friendly summaries in seconds, using a 3-agent Llama 3.3 70B pipeline built and fine-tuned on AMD hardware.

LoneSoloWolf

AMD Developer CloudAMD ROCmLLaMAHuggingFace Hub

SmartBandit Router

nyan

Thymus

Thymus is a lightweight hybrid token-efficient router designed to maximize accuracy while minimizing token costs in multi‑task LLM pipelines. It dynamically routes user queries across local and remote models on LLM providers.

The Disappointer

HuggingFace HubLLaMAAMD Developer Cloud

CrosslaneAI

An agentic pipeline that takes any CUDA repo and autonomously ports it to ROCm/HIP — analyze → transpile → compile → test → benchmark on real AMD GPUs — with an LLM repair loop fixing everything the mechanical tools can't.

Soloengenier

AMD Developer CloudAMD ROCmHuggingFace SpacesVercelAWS

AI Classroom Edge Intelligence

A privacy-first classroom AI platform that routes sensitive work to local edge systems and eligible anonymized analysis to Fireworks AI, helping teachers make faster, safer instructional decisions even with unreliable internet.

AI Classroom Edge

AMD Developer CloudQwen3rest apiGithub CopilotCodexChatGPT