Top Builders

Explore the top contributors showcasing the highest number of app submissions within our community.

Qwen

Qwen is a large language model family developed by the Qwen team at Alibaba Cloud. First released in 2023, the series spans dense and mixture-of-experts architectures across text, vision, and code, with most models published under the Apache 2.0 license. Developers can access Qwen models through Alibaba Cloud's Model Studio (DashScope) using an OpenAI-compatible API, or download weights directly from Hugging Face and GitHub.

General
CompanyQwen / Alibaba Cloud
Founded2023 (first model release)
HeadquartersHangzhou, China
Websiteqwen.ai
DocumentationQwen Docs
GitHubgithub.com/QwenLM
Hugging Facehuggingface.co/Qwen
TypeLLM Provider / Open-Source AI Lab

Core Products

Qwen3 (Text LLMs)

Qwen3 is the flagship text model family, released in April 2025 under Apache 2.0. It includes dense models from 0.6B to 32B parameters and mixture-of-experts models up to 235B total parameters (22B active). All models support multilingual text generation, reasoning, tool use, and agentic workflows.

Qwen3-Coder

Qwen3-Coder is a coding-specialized model with 480B total parameters and 35B active, trained on 7.5 trillion tokens with a 70% code-focused dataset. Released in July 2025, it achieves state-of-the-art results among open models on SWE-Bench Verified.

Qwen3.6 (Vision-Language)

Qwen3.6 is a multimodal model with a unified vision-language architecture trained on trillions of multimodal tokens. It supports text and image inputs across 201 languages and dialects, with capabilities covering reasoning, coding, and visual understanding.

Qwen-Image-2.0

Qwen-Image-2.0 is a 7B-parameter image generation model supporting photorealism, professional typography, and unified generation-editing workflows, released in February 2026.

Qwen-MT

Qwen-MT is a translation model covering 92 major languages and dialects, reaching over 95% of the global population. It is designed for high-quality translation in production pipelines.

Qwen Code

Qwen Code is an open-source terminal coding agent optimized for the Qwen model series. It supports writing features, fixing bugs, navigating large codebases, and generating pull requests, with GitHub Actions integration available.


Developer Resources

Qwen models are accessible through Alibaba Cloud Model Studio (DashScope) via an OpenAI-compatible API, or as open weights on Hugging Face. The API supports both text-only and multimodal models.


Key Features

Open weights under Apache 2.0 Most Qwen3 models are released under Apache 2.0, allowing commercial use, fine-tuning, and redistribution without restrictions.

OpenAI-compatible API Qwen models are served through DashScope using the OpenAI-compatible endpoint format, making it straightforward to use Qwen models in existing OpenAI SDK integrations.

Multilingual coverage Qwen3.6 supports 201 languages and dialects. Qwen-MT covers 92 major languages for dedicated translation tasks.

Mixture-of-Experts (MoE) architecture The largest Qwen3 models use MoE, activating only a subset of total parameters per token (for example, 22B of 235B active). This reduces inference cost relative to comparably capable dense models.


Use Cases

Agentic coding workflows Qwen3-Coder and Qwen Code are designed for software development tasks: writing features, fixing bugs, navigating large codebases, and generating pull requests via the terminal or CI pipelines.

Multilingual applications Qwen-MT and Qwen3.6's broad language support make them suitable for translation tools, multilingual chatbots, and localized content pipelines.

Multimodal document and image processing Qwen3.6 handles image understanding, document analysis, and visual reasoning alongside text, enabling applications like document Q&A and visual search.

Qwen AI Technologies Hackathon projects

Discover innovative solutions crafted with Qwen AI Technologies, developed by our community members during our engaging hackathons.

OmniDoc — Talk to Any Document

OmniDoc — Talk to Any Document

Documents aren't just text. Financial reports live in charts. Scientific insights hide in figures. Legal risks bury in tables. Traditional document AI treats visuals as noise. OmniDoc treats them as signal. OmniDoc is a multimodal document intelligence platform that understands everything: text, charts, tables, diagrams, handwritten notes, scanned pages, equations, and mixed-language content. Upload any document and talk to it. Ask: "What was the gross margin trend from section 3 charts?" → OmniDoc reads the bars, not just surrounding text. "Which appendix clauses exceed $500K?" → Parses tables precisely. "Explain the page-12 diagram's relation to the conclusion" → Understands figures in context. Powered by a two-model pipeline optimized for AMD MI300X: • Llama 3.2 Vision 90B processes pages as high-res images, preserving layout and visuals • Qwen3-VL extracts structured data from tables/forms with cross-lingual precision Both run simultaneously on a single MI300X (192GB HBM3, 5.3TB/s bandwidth)—eliminating the complex multi-GPU parallelism H100s would require. Pipeline: 300 DPI page rendering → Llama for semantic structure → Qwen for table precision → retrieval layer → intelligent query routing → cited responses with confidence scores. Performance: 100-page PDF in 42s | 340 pages/min batch | 12 concurrent sessions | ~18× faster than cloud CPU. Use it for: M&A due diligence, regulatory review, academic literature synthesis, contract portfolio analysis, insurance claims with form+image understanding. Ships as a ready-to-use web app: drag-and-drop upload, conversational Q&A, document navigation, and citation tracking that links every answer to its source page and element.

Boundary Forge

Boundary Forge

Boundary Forge is a model-agnostic AI safety pipeline that helps enterprises deploy LLMs with measurable confidence. Instead of relying on manual red-teaming or hoping a system prompt is enough, Boundary Forge automatically attacks a model, identifies where it behaves unsafely or inconsistently, and converts those discovered failures into runtime guardrails. For this hackathon, we demonstrated Boundary Forge using Qwen 2.5-72B on AMD Developer Cloud with AMD MI300X. Qwen powered the adversarial red-team workflow and was also the model under test, allowing the system to expose real behavioral failure boundaries such as jailbreak attempts, policy drift, unsafe financial guidance, KYC bypass, fraud patterns, coercion signals, asset concealment, and inconsistent refusals. The pipeline works in five stages: generate adversarial probes, run high-throughput model inference, mathematically detect boundary failures, compile those failures into semantic safety rules, and enforce them through middleware before risky prompts reach the LLM. This creates a practical enterprise safety layer that can block, flag, or ask for clarification in real time. The important point is that Boundary Forge is not tied to one model. Qwen 2.5-72B was used to demonstrate the system, but the architecture can benchmark and harden other open-source or proprietary models as well. The goal is to improve models exactly where they fail and make model evaluation repeatable across different deployments. In our AMD Cloud production run with Qwen 2.5-72B, Boundary Forge generated 1,009 unique adversarial probes, fired 4,036 total inferences, discovered 25 boundary failures, and compiled 15 semantic safety rules. The middleware intercepted 68% of known attacks and reduced the effective failure rate from 2.48% to 0.79%. Boundary Forge turns AI safety into an automated engineering workflow: attack, measure, learn, protect, and benchmark again.

Thor v2 — RAG-Free Fitness Intelligence

Thor v2 — RAG-Free Fitness Intelligence

Thor v2 is a domain-expert fitness AI built on a single fine-tuned Qwen3-8B model trained on 7,118 carefully constructed instruction-response pairs spanning exercise science, nutrition, programming, injury screening, and population-specific guidance. Unlike RAG-based fitness apps that retrieve documents at query time, Thor v2 encodes knowledge directly into model weights during supervised fine-tuning on AMD MI300X hardware using ROCm. Evidence is referenced through compact citation keys — e.g. [CITE:NSCA_HYPERTROPHY_VOLUME] — that the model emits inline. A lightweight citation resolver validates these keys against a locked registry and surfaces the source document on demand. If the model emits an unknown key, it is rejected at runtime. Hallucinated citations are structurally impossible. The dataset covers 113 unique citation keys from 9 authoritative organisations — NSCA, ACSM, ISSN, NASM, HHS, USDA, NIH, CDC, and ExRx — with 80 exercise technique entries and 14 population profiles including senior, postpartum, teen, vegan, rehab return, and competitive athlete. Six conversational style variants (casual, research-nerd, anxious, skeptical, verbose, follow-up-first) are baked into training so the model adapts tone naturally without prompt engineering. Training results: 100% JSON contract pass rate across all eval prompts. Coach gating behavior confirmed — model asks clarifying questions before prescribing when context is missing, rather than giving generic advice. All responses emit valid citation_keys, follow_up_questions, and safety_notes fields. Adapter size: <350MB on top of a frozen 8B base. Built entirely on AMD MI300X (192GB HBM3, ROCm 6.3) using HuggingFace PEFT + TRL. One model. No retrieval. No vector database. The model knows. The resolver proves.

TempoGraph: Local Multimodal Video Analysis

TempoGraph: Local Multimodal Video Analysis

TempoGraph is a fully-local, privacy-preserving multimodal video analysis system that turns raw video files into rich structured outputs — entities, behaviors, transcripts, timelines, and interactive knowledge graphs — without sending a single frame to the cloud. Stage 1 — Frame Selection: Motion-aware sampling with static, moving, and auto camera modes. For moving cameras it estimates homography to separate object motion from camera movement, then identifies keyframes where motion peaks exceed a configurable sigma threshold. Stage 1.5 — Audio Transcription: Whisper.cpp running on Vulkan transcribes the full audio track to millisecond-accurate segments. Stage 2 — YOLO Detection: YOLO26 runs on 2nd GPU over every sampled frame, outputting normalized bounding boxes, class names, track IDs, and confidence scores. Stage 3 — Depth Estimation: Depth Anything V2 via HuggingFace Transformers adds per-detection mean depth to every bounding box, giving 3D spatial context to 2D detections. Stage 4 — Frame Scoring: Picks which frames the VLM actually sees. In keyframes mode, only motion-peak frames are forwarded. In scored mode, FrameScorer ranks all YOLO-scanned frames using a weighted combination of motion delta, new YOLO class appearances, tracked object churn, and IoU drop between frames — then fills the VLM budget with the highest-signal frames. Keyframes are always pinned in first regardless of mode. Stage 5 — VLM Captioning: Qwen3.5-VL-9B served by a custom llama.cpp build compiled for AMD ROCm/HIP, running on an AMD RX 9070 XT with a 100k-token context window. Frames are chunked and sent to the model alongside YOLO-derived annotations. Each chunk's summary seeds the next prompt for narrative continuity across the video. Stage 6 — Aggregation: A final text-only LLM call synthesizes all per-chunk captions and the audio transcript into a structured JSON with entities, visual events, audio events, and multimodal correlations linking what was said to what was seen.