Combo - A Multi-Model AI Chat

Created by team MeoWML on May 08, 2026
Hugging FaceFine-Tuning on AMD GPUs (Advanced / GPU-Intensive)

Combo is a locally-executed, multi-model AI ensemble system that runs three heterogeneous large language models. Rather than relying on a single model's perspective, Combo queries all three models independently on the same prompt, then synthesizes their individual outputs into one unified, high-confidence answer. This is inspired by ensemble learning in classical machine learning, now applied to generative AI inference. Everything runs offline on consumer AMD hardware with no cloud dependency, no API calls, and no data leaving the machine. Individual large language models, regardless of their scale, carry inherent biases, knowledge gaps, and reasoning blind spots shaped by their training data and fine-tuning objectives. A single model can confidently produce an answer that is subtly wrong, incomplete, or one-sided. Combo challenges this assumption by demonstrating that sequential multi-model ensemble inference is practical on a consumer AMD GPU within tight VRAM constraints, and that the synthesized output is qualitatively richer than any single model's response alone. The Gradio UI maintains a full session history using Gradio's gr.State. On each new turn, the entire prior conversation is prepended to the prompt sent to all three models, allowing the ensemble to reference, connect and build upon earlier context throughout the session not just respond to isolated prompts. Instead of treating models outputs equally in synthesis, each answer could be scored for internal consistency, factual grounding and relevance, then weighted accordingly in the combo summary, giving higher authority to more confident or more detailed responses. More model can be included and giving user choice of models. Attaching a local vector database to the pipeline would allow models to retrieve relevant document chunks before answering, grounding the ensemble's responses in user-supplied knowledge bases and reducing hallucination.

Category tags: