
Documents aren't just text. Financial reports live in charts. Scientific insights hide in figures. Legal risks bury in tables. Traditional document AI treats visuals as noise. OmniDoc treats them as signal. OmniDoc is a multimodal document intelligence platform that understands everything: text, charts, tables, diagrams, handwritten notes, scanned pages, equations, and mixed-language content. Upload any document and talk to it. Ask: "What was the gross margin trend from section 3 charts?" → OmniDoc reads the bars, not just surrounding text. "Which appendix clauses exceed $500K?" → Parses tables precisely. "Explain the page-12 diagram's relation to the conclusion" → Understands figures in context. Powered by a two-model pipeline optimized for AMD MI300X: • Llama 3.2 Vision 90B processes pages as high-res images, preserving layout and visuals • Qwen3-VL extracts structured data from tables/forms with cross-lingual precision Both run simultaneously on a single MI300X (192GB HBM3, 5.3TB/s bandwidth)—eliminating the complex multi-GPU parallelism H100s would require. Pipeline: 300 DPI page rendering → Llama for semantic structure → Qwen for table precision → retrieval layer → intelligent query routing → cited responses with confidence scores. Performance: 100-page PDF in 42s | 340 pages/min batch | 12 concurrent sessions | ~18× faster than cloud CPU. Use it for: M&A due diligence, regulatory review, academic literature synthesis, contract portfolio analysis, insurance claims with form+image understanding. Ships as a ready-to-use web app: drag-and-drop upload, conversational Q&A, document navigation, and citation tracking that links every answer to its source page and element.
10 May 2026