1
1
Bangladesh
1 year of experience
Hello guys π Iβm Piyal Datta, a Computer Science and Engineering enthusiast from Bangladesh π§π© who loves building useful and creative digital solutions π‘. I enjoy working with Python and C/C++, and Iβm always eager to learn and improve my skills π. I also have a strong interest in research, with my work presented at an IEEE-affiliated international conference on usability evaluation using machine learning π§ . Iβve worked with both machine learning and deep learning models, and I enjoy exploring the balance between practical engineering and academic innovation π. Iβm passionate, curious, and excited to keep learning, collaborating, and building impactful technology β¨
.png&w=828&q=75)
Combo is a locally-executed, multi-model AI ensemble system that runs three heterogeneous large language models. Rather than relying on a single model's perspective, Combo queries all three models independently on the same prompt, then synthesizes their individual outputs into one unified, high-confidence answer. This is inspired by ensemble learning in classical machine learning, now applied to generative AI inference. Everything runs offline on consumer AMD hardware with no cloud dependency, no API calls, and no data leaving the machine. Individual large language models, regardless of their scale, carry inherent biases, knowledge gaps, and reasoning blind spots shaped by their training data and fine-tuning objectives. A single model can confidently produce an answer that is subtly wrong, incomplete, or one-sided. Combo challenges this assumption by demonstrating that sequential multi-model ensemble inference is practical on a consumer AMD GPU within tight VRAM constraints, and that the synthesized output is qualitatively richer than any single model's response alone. The Gradio UI maintains a full session history using Gradio's gr.State. On each new turn, the entire prior conversation is prepended to the prompt sent to all three models, allowing the ensemble to reference, connect and build upon earlier context throughout the session not just respond to isolated prompts. Instead of treating models outputs equally in synthesis, each answer could be scored for internal consistency, factual grounding and relevance, then weighted accordingly in the combo summary, giving higher authority to more confident or more detailed responses. More model can be included and giving user choice of models. Attaching a local vector database to the pipeline would allow models to retrieve relevant document chunks before answering, grounding the ensemble's responses in user-supplied knowledge bases and reducing hallucination.
10 May 2026