DocuMind AI — Multi-Modal Document Intelligence

Created by team DocuMind AI on May 08, 2026

AMD Developer Cloud AMD ROCm CrewAI Streamlit

AI Agents & Agentic Workflows (Best Track for Beginners)Vision & Multimodal AI

DocuMind AI is a multi-modal, multi-agent document intelligence platform built on AMD MI300X GPU infrastructure via Fireworks AI's OpenAI-compatible inference API. The Problem: Professionals spend hours manually extracting insights from complex documents containing both text and visual content (charts, diagrams, figures). Existing tools handle text OR images, not both together. The Solution: DocuMind AI runs a sequential 4-agent CrewAI pipeline on AMD Instinct MI300X GPUs: 1. Vision Agent — uses Kimi K2.5 to analyze every image, chart, and diagram embedded in the document 2. Reader Agent — uses DeepSeek V3.1 to extract text, identify entities, classify document type, and extract key facts 3. Analyst Agent — synthesizes visual AND textual findings into cross-modal insights 4. Reporter Agent — produces a structured intelligence report: Executive Summary, Key Entities, Categorized Insights, and Actionable Recommendations AMD Technology Used: AMD Instinct MI300X GPUs via Fireworks AI with OpenAI-compatible API — zero-friction integration with CrewAI and Python AI ecosystem. Kimi K2.5 vision-language model + DeepSeek V3.1 open-source frontier text model, both running on AMD hardware. Why This Matters: Covers Track 1 (AI Agents) AND Track 3 (Vision/Multimodal). Real business value for researchers, analysts, and professionals. Live demo on Hugging Face Spaces with zero setup required for judges. MIT open-source with full documentation.

Category tags:

Documents, Developer Tools, Data Mastery

Github Presentation Demo

Explore more applications

Simple Video captioner

This project is a simple video captioner. Uses a VLM to extract the text from a video and another model to transform the description in the required styles. As a fallback, it contains a fine-tunned local model based on Gemma 4.

Neotron

AMD Developer CloudClaude CodeAI/ML API

AIVE-Artificial Intelligence Venture Engine

AIVE (Artificial Intelligence Venture Engine) is a Cognitive Discovery Operating System that transforms documents, research papers, and data into structured knowledge, evidence-backed insights, dynamic reasoning, and actionable innovation opportunities.

Quacky Wonderland

AntigravitykiroVercelChatGPTClaude CodeCodex

SOAP Copilot: AI Clinical Scribe on AMD

SOAP Copilot turns raw doctor-patient conversations into structured SOAP notes, ICD-10 codes, and patient-friendly summaries in seconds, using a 3-agent Llama 3.3 70B pipeline built and fine-tuned on AMD hardware.

LoneSoloWolf

AMD Developer CloudAMD ROCmLLaMAHuggingFace Hub

SmartBandit Router

A reinforcement-learning router that learns which AI model to call for each query, cutting cost by ~93% without ever hardcoding a routing rule.

nyan

LLaMAQwen3GPT-3

Thymus

Thymus is a lightweight hybrid token-efficient router designed to maximize accuracy while minimizing token costs in multi‑task LLM pipelines. It dynamically routes user queries across local and remote models on LLM providers.

The Disappointer

HuggingFace HubLLaMAAMD Developer Cloud

Muhammad Yaseen

Upcoming AI Hackathons
For Innovators & Creators

Explore more applications

Simple Video captioner

This project is a simple video captioner. Uses a VLM to extract the text from a video and another model to transform the description in the required styles. As a fallback, it contains a fine-tunned local model based on Gemma 4.

Neotron

AMD Developer CloudClaude CodeAI/ML API

AIVE-Artificial Intelligence Venture Engine

AIVE (Artificial Intelligence Venture Engine) is a Cognitive Discovery Operating System that transforms documents, research papers, and data into structured knowledge, evidence-backed insights, dynamic reasoning, and actionable innovation opportunities.

Quacky Wonderland

AntigravitykiroVercelChatGPTClaude CodeCodex

SOAP Copilot: AI Clinical Scribe on AMD

SOAP Copilot turns raw doctor-patient conversations into structured SOAP notes, ICD-10 codes, and patient-friendly summaries in seconds, using a 3-agent Llama 3.3 70B pipeline built and fine-tuned on AMD hardware.

LoneSoloWolf

AMD Developer CloudAMD ROCmLLaMAHuggingFace Hub

SmartBandit Router

A reinforcement-learning router that learns which AI model to call for each query, cutting cost by ~93% without ever hardcoding a routing rule.

nyan

LLaMAQwen3GPT-3

Thymus

Thymus is a lightweight hybrid token-efficient router designed to maximize accuracy while minimizing token costs in multi‑task LLM pipelines. It dynamically routes user queries across local and remote models on LLM providers.

The Disappointer

HuggingFace HubLLaMAAMD Developer Cloud