Qwen-Image-2.0

Qwen-Image-2.0 is Alibaba Cloud's second-generation image foundation model, released on February 10, 2026. It combines image generation and editing into a single model, replacing the separate pipelines required by earlier approaches. The architecture pairs an 8B Qwen3-VL encoder with a 7B diffusion decoder, producing a leaner design than its predecessor (which used 20B parameters) while achieving higher benchmark scores. It supports native 2048x2048 resolution output.

General
Release date10 Feb 2026
DeveloperQwen / Alibaba Cloud
TypeImage generation and editing model
Architecture8B Qwen3-VL encoder + 7B diffusion decoder
GitHubQwenLM/Qwen-Image
Documentationqwenlm.github.io

Core Features

  • Unified generation and editing: a single model handles both text-to-image generation and instruction-based image editing without switching pipelines.
  • Native 2K resolution: generates images at up to 2048x2048 pixels natively.
  • Professional typography: supports up to 1,000-token prompt instructions for text-heavy visuals, generating accurate text in posters, infographics, slides, and comics.
  • Photorealism: produces fine-grained detail at 2K including skin texture, fabric weave, and natural foliage.
  • Efficient architecture: 7B active parameters vs. 20B in Qwen-Image 1.0, with faster inference and higher quality.

Benchmarks

BenchmarkQwen-Image-2.0FLUX.1 (12B)
DPG-Bench88.3283.84
AI Arena (text-to-image)#1
AI Arena (image editing)#1

Tools and Resources


Ecosystem and Integrations

  • Accessible via Alibaba Cloud DashScope API for programmatic generation and editing.
  • Available through Qwen Studio for no-code testing.
  • The encoder backbone (Qwen3-VL) is the same model used for vision-language understanding tasks.

To get started, visit Qwen Studio for a live demo, or use the Qwen API Platform for API access with the DashScope SDK.