Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.modellix.ai/llms.txt

Use this file to discover all available pages before exploring further.

May 19, 2026

Google

Veo 3.1 Lite

  • Veo 3.1 Lite T2V: Google Veo 3.1 Lite text-to-video model with person generation control. Supports resolutions up to 1080p. Duration: 4/6/8 seconds.
  • Veo 3.1 Lite I2V: Google Veo 3.1 Lite image-to-video model with person generation control. Supports resolutions up to 1080p. Duration: 4/6/8 seconds.
May 15, 2026

OpenAI

Image Generation

  • gpt-image-2: OpenAI’s state-of-the-art image generation model for fast, high-quality output, supporting resolutions up to 4K (3840x2160).
  • gpt-image-1.5: A balanced image generation model offering solid performance, transparent background support, and resolutions up to 1536x1024.

Image Editing

  • gpt-image-2-edit: OpenAI’s state-of-the-art image editing model, supporting high-resolution outputs including 2K and 4K.
  • gpt-image-1.5-edit: A balanced image editing model featuring transparent background support, input fidelity control, and multi-image editing capabilities (up to 16 images).
Apr 29, 2026

Alibaba

HappyHorse

  • happyhorse-1.0-t2v: HappyHorse text-to-video model generates physically realistic and smoothly animated video content from text prompts. The model focuses on physical realism and motion fluidity, supporting various resolution and aspect ratio combinations with 3-15 seconds duration.
  • happyhorse-1.0-i2v: HappyHorse image-to-video model generates physically realistic and smoothly animated video content from a first frame image. The model can optionally use text prompts for guidance, supporting 720P/1080P resolution and 3-15 seconds duration.
  • happyhorse-1.0-r2v: HappyHorse reference-to-video model generates fluid videos by fusing characters from multiple reference images (1-9 images) through text prompts with character references. Supports 720P/1080P resolution, multiple aspect ratios, and 3-15 seconds duration.
  • happyhorse-1.0-video-edit: HappyHorse video editing model supports style transformation and local replacement by combining input video with reference images (0-5) and text instructions. Input video duration: 3-60 seconds. Output video duration: 3-15 seconds.
Apr 24, 2026

Kling

Kling V3

  • Kling V3 T2I: Kling V3 text-to-image model with improved prompt adherence and 1K/2K output support for higher-fidelity creative generation.
  • Kling V3 I2I: Kling V3 image-to-image model for higher-fidelity editing and restyling with 1K/2K output support.
  • Kling V3 Omni Image: Kling V3 Omni image model supporting single-image and series generation with image references, element references, and up to 4K output.
  • Kling V3 T2V: Kling V3 text-to-video model supporting single-shot and storyboard-based multi-shot generation with 3-15 second output duration.
  • Kling V3 I2V: Kling V3 image-to-video model supporting prompt-driven animation, storyboard workflows, element references, and 3-15 second output duration.
  • Kling V3 Omni Video: Kling V3 Omni video model for advanced video-to-video generation, combining prompts, storyboard segments, image references, element references, and optional video inputs.

Kling Video O1

  • Kling Video O1: Kling Video O1 omni video model for storyboard-driven video-to-video generation, supporting reference videos, optional first/end frames, and element-guided editing.
Apr 17, 2026

Alibaba

Image Generation

  • qwen-image-2.0: Accelerated text-to-image model balancing quality and speed, supporting resolutions up to 2688*1536 and batch generation of 1-6 images.
  • qwen-image-2.0-pro: The most capable Qwen-Image 2.0 model with stronger text rendering, realistic textures, and semantic adherence, supporting batch generation of 1-6 images.
  • qwen-image-max: High-realism text-to-image model with reduced AI artifacts, fixed resolution options, and 1 image output per request.
  • wan2.7-image: Standard text-to-image model with faster generation speed, up to 2K resolution, thinking mode, sequential generation, and custom color themes.
  • wan2.7-image-pro: Professional text-to-image model supporting up to 4K resolution, thinking mode, sequential multi-image generation, and custom color themes.
  • z-image-turbo: Lightweight text-to-image model optimized for speed with Chinese and English text rendering support, outputting 1 PNG image per request.

Image Editing

  • qwen-image-2.0-edit: Accelerated image editing model balancing quality and speed, supporting 1-3 input images and 1-6 outputs with customizable resolution.
  • qwen-image-2.0-pro-edit: Professional image editing model with stronger text rendering, realistic quality, and semantic following, supporting 1-3 input images and 1-6 outputs.
  • qwen-image-edit-max: Advanced image editing model focused on industrial design, geometry reasoning, and character consistency, supporting 1-3 input images and 1-6 outputs.
  • wan2.7-image-edit: Standard image editing model with faster generation speed, supporting multi-image reference, interactive editing, sequential generation, and max 2K output.
  • wan2.7-image-pro-edit: Professional image editing model supporting multi-image reference, interactive bounding-box editing, sequential generation, 1-9 input images, and max 2K output.

Video Generation & Editing

  • wan2.1-vace-plus: Unified video editing model supporting five functions: multi-image reference, video repainting, local editing, video extension, and video outpainting.
  • wan2.2-animate-mix: Video character replacement model that swaps the main character with a reference image while preserving scene, lighting, and color tone.
  • wan2.2-animate-move: Motion transfer model that applies movement and expressions from a reference video to a character in a static image.
  • wan2.6-r2v: Reference-to-video model that generates from reference images or videos with multi-character interaction and role-playing, producing silent video by default.
  • wan2.6-r2v-flash: Faster reference-to-video model with audio or silent output switching, optimized for quick previews and cost-effective generation.
  • wan2.7-r2v: Advanced reference-to-video model using reference images/videos with prompt guidance, supporting multi-subject references, storyboard generation, and custom audio voice cloning.
  • wan2.7-videoedit: Multi-modal video editing model for style modification and edits with text/image/video inputs, with typical processing time of 1-5 minutes.
Apr 15, 2026

ByteDance

Seedream 5.0

  • seedream-5.0-lite: ByteDance Seedream 5.0 Lite text-to-image model with 2K/3K custom resolutions and configurable output format.
  • seedream-5.0-lite-edit: ByteDance Seedream 5.0 Lite Edit image-to-image model supporting single-image editing, multi-image fusion, and configurable output format.

Seedance 2.0

  • seedance-2.0-i2v: Dreamina Seedance 2.0 image-to-video—generate from a text prompt with optional frame, image, and audio references.
  • seedance-2.0-fast-i2v: Faster Seedance 2.0 image-to-video with the same request parameters.
  • seedance-2.0-v2v: Dreamina Seedance 2.0 video-to-video—transform 1–3 reference clips with a text prompt and optional image or audio references.
  • seedance-2.0-fast-v2v: Faster Seedance 2.0 video-to-video with the same request parameters.
Apr 8, 2026

Google

Nano Banana

  • Nano Banana: Nano Banana image generation model. Generates images from text prompts with support for 10 aspect ratios, delivering fast and cost-effective results.
  • Nano Banana Pro: Nano Banana Pro image generation model with higher quality output. Supports aspect ratio selection and output resolutions up to 4K.
  • Nano Banana 2: Nano Banana 2 multimodal image model supporting both text-to-image and image-to-image workflows, with 14 aspect ratios and resolutions from 512 to 4K.
  • Nano Banana Edit: Nano Banana image editing model. Transforms existing images based on prompt instructions with support for 10 aspect ratios.
  • Nano Banana Pro Edit: Nano Banana Pro image editing model with superior detail preservation and enhanced prompt adherence. Supports output resolutions up to 4K.
  • Nano Banana 2 Edit: Nano Banana 2 multimodal model in image-to-image editing mode. Requires a base64 data URI image input with support for multiple aspect ratios and resolutions from 512 to 4K.

Imagen 4.0

  • Imagen 4.0: Google Imagen 4.0 standard text-to-image model delivering high-quality photorealistic images. Supports batch generation (up to 4 images), person generation control, and output resolutions up to 2K.
  • Imagen 4.0 Ultra: Google Imagen 4.0 Ultra text-to-image model with the highest quality output. Optimized for maximum detail and photorealism with batch generation and up to 2K resolution.
  • Imagen 4.0 Fast: Google Imagen 4.0 Fast text-to-image model optimized for speed. Supports batch generation (up to 4 images), multiple aspect ratios, and person generation control.

Veo 3.1

  • Veo 3.1 T2V: Google’s flagship text-to-video model supporting resolutions up to 4K and optional reference images (up to 3) for style or character consistency across generations. Duration: 4/6/8 seconds.
  • Veo 3.1 Fast T2V: A faster variant of Veo 3.1 with the same capabilities, including 4K resolution and reference image support. Duration: 4/6/8 seconds.
  • Veo 3.1 I2V: Google’s flagship image-to-video model supporting resolutions up to 4K with person generation control. Duration: 4/6/8 seconds.
  • Veo 3.1 Fast I2V: A faster variant of Veo 3.1 I2V with the same capabilities, including 4K resolution support. Duration: 4/6/8 seconds.

Veo 3

  • Veo 3 T2V: Google Veo 3.0 stable text-to-video model with resolutions up to 1080p and person generation control. Duration: 4/6/8 seconds.
  • Veo 3 Fast T2V: A faster variant of Veo 3 with the same parameter set, supporting resolutions up to 1080p. Duration: 4/6/8 seconds.
  • Veo 3 I2V: Google Veo 3.0 stable image-to-video model with resolutions up to 1080p and person generation control. Duration: 4/6/8 seconds.
  • Veo 3 Fast I2V: A faster variant of Veo 3 I2V with the same parameter set, supporting resolutions up to 1080p. Duration: 4/6/8 seconds.

Veo 2

  • Veo 2 T2V: Google Veo 2.0 classic text-to-video model with flexible person generation policies (allow all, adult only, or disallow). Duration: 5/6/8 seconds.
  • Veo 2 I2V: Google Veo 2.0 classic image-to-video model with flexible person generation policies (allow adult or disallow). Duration: 5/6/8 seconds.
Feb 25, 2026

Kling

Kling V1

  • Kling V1 T2I: Kuaishou’s foundational text-to-image model offering fast, cost-effective 1K image generation with strong prompt adherence and multiple aspect ratios.
  • Kling V1 I2I: Kuaishou’s first-generation AI image model using a diffusion transformer architecture, capable of generating 1K-resolution images with strong prompt adherence and realistic detail.
  • Kling V1 T2V: Kuaishou’s first-generation text-to-video model generating 5s or 10s clips with camera motion presets (pan, tilt, zoom) and adjustable prompt relevance.
  • Kling V1 I2V: Kuaishou’s first-generation image-to-video model that animates static images into 5s or 10s videos with motion brush support and adjustable prompt relevance.

Kling V1.5

  • Kling V1.5 T2I: An enhanced text-to-image model with improved realism and subject/face reference support for generating consistent character images at 1K resolution.
  • Kling V1.5 I2I: An upgraded image-to-image model with improved realism, better prompt interpretation, and subject/face reference modes for precise character control.
  • Kling V1.5 I2V: The most feature-complete V1.x image-to-video model, adding simple camera motion control alongside motion brush and cfg_scale for precise video generation.

Kling V1.6

  • Kling V1.6 T2V: An improved text-to-video model with significantly better prompt adherence and visual quality over V1.5, supporting dual standard/professional generation modes.
  • Kling V1.6 I2V: An improved image-to-video model with significantly better prompt adherence and visual quality over V1.5, supporting first-and-last frame control for smooth transitions.
  • Kling V1.6 MI2V: Transforms up to 4 reference images into a cohesive video sequence with multi-element fusion, enabling character interaction and complex visual narratives.

Kling V2

  • Kling V2 T2I: A next-generation text-to-image model with significantly improved detail and visual fidelity, supporting both 1K and 2K resolutions for professional output.
  • Kling V2 New T2I: A refined variant of V2 with updated model weights for sharper details, better consistency, and improved prompt-to-image alignment at up to 2K resolution.
  • Kling V2 I2I: A major generational leap in image quality and creativity, featuring enhanced style diversity and significantly improved visual fidelity over V1.5.
  • Kling V2 New I2I: A refined variant of Kling V2 with updated model weights for improved consistency, sharper details, and better prompt-to-image alignment.
  • Kling V2 MI2I: Combines up to 4 subject images with optional scene and style references into a single cohesive output, supporting subject fusion, scene replacement, and style transfer.
  • Kling V2 Master T2V: The V2-generation base text-to-video model producing cinematic-quality clips with superior motion realism and temporal coherence.
  • Kling V2 Master I2V: The V2-generation base image-to-video model delivering cinematic-quality animations with superior temporal coherence and smoother motion transitions.

Kling V2.1

  • Kling V2.1 T2I: The latest and highest-quality text-to-image model in the Kling family, delivering state-of-the-art results at up to 2K resolution.
  • Kling V2.1 I2I: The latest cost-efficient image-to-image model offering studio-grade quality with faster rendering and excellent prompt adherence.
  • Kling V2.1 MI2I: The latest and highest-quality multi-image composition model, delivering superior subject fusion, scene replacement, and style transfer with up to 4 subject images.
  • Kling V2.1 Master T2V: The V2.1-generation text-to-video model with enhanced rendering quality, improved frame consistency, and studio-grade 1080p output.
  • Kling V2.1 I2V: A cost-efficient image-to-video model with advanced frame control and up to 1080p output, suitable for professional content creation.
  • Kling V2.1 Master I2V: The recommended high-quality image-to-video model in the V2.1 series, producing studio-grade 1080p videos with precise start and end frame control.

Kling V2.5

  • Kling V2.5 Turbo T2V: A speed-optimized text-to-video model delivering cinematic 1080p videos with physics-accurate motion at ~30% lower cost than previous versions.
  • Kling V2.5 Turbo I2V: A speed-optimized image-to-video model delivering cinematic 1080p videos with physics-accurate motion at ~30% lower cost than previous versions.

Kling V2.6

  • Kling V2.6 T2V: The first Kling text-to-video model to natively generate synchronized audio and video in one pass, including dialogue, ambient sounds, and lip-synced speech.
  • Kling V2.6 I2V: The first Kling image-to-video model to natively generate synchronized audio and video in a single pass, supporting dialogue, sound effects, and lip-synced speech alongside motion brush.

Kling Avatar & Effects

  • Kling Avatar: Generates realistic talking-head videos from a reference image and audio input, with precise lip synchronization, expressive gestures, and support for multiple languages.
  • Kling Video Effects: Applies 212 preset creative video effects — including dance, transformation, interaction, and animation styles — to one or two person images for instant viral content.

Kling Image Utilities

  • Kling Image Expansion: Intelligently extends images in any direction (up, down, left, right) with prompt-guided content generation, ideal for panorama creation, background extension, and canvas expansion.
  • Kling Image Recognize: Detects and segments image content into 4 categories — object, head (with hair), face (without hair), and clothing — returning segmentation masks synchronously.
  • Kling Image O1: A multimodal image generation model that accepts text, up to 10 reference images, and element inputs to produce 1K/2K images with precise style control and multi-reference feature extraction.

Kolors Virtual Try-On

  • Kolors Virtual Try-On V1: AI-powered virtual clothing try-on built on the Kolors diffusion model, generating realistic fitting results from a person photo and a single garment image (tops, bottoms, or dresses).
  • Kolors Virtual Try-On V1-5: Enhanced virtual try-on model that supports both single garments and top+bottom outfit combinations, delivering higher-quality results with automatic clothing type detection.
Feb 4, 2026

MiniMax

MiniMax Image-01

  • MiniMax Image-01 T2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.
  • MiniMax Image-01 I2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.
  • MiniMax Image-01-Live I2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.

Hailuo

  • Hailuo 2.3 T2V: Hailuo 2.3 not only generates high-quality videos from text or images with exceptional instruction following, but also redefines realism through its state-of-the-art mastery of extreme physics.
  • Hailuo 2.3 I2V: Hailuo 2.3 not only generates high-quality videos from text or images with exceptional instruction following, but also redefines realism through its state-of-the-art mastery of extreme physics.
  • Hailuo 2.3 Fast I2V: Hailuo 2.3 Fast efficiently transforms images into dynamic videos with extreme physics mastery. It delivers exceptional value by generating high-quality, realistic motion at a reduced computational cost.
  • Hailuo 02 T2V: Hailuo 02 masters both text-to-video and image-to-video generation with exceptional instruction following, while setting a new standard in visual realism through its extreme physics simulation.
  • Hailuo 02 I2V: Hailuo 02 masters both text-to-video and image-to-video generation with exceptional instruction following, while setting a new standard in visual realism through its extreme physics simulation.
  • Hailuo 02 FL2V: Hailuo 02’s FL2V function provides unprecedented creative control by generating dynamic videos between a user-defined start and end frame. This feature not only masters extreme physics and complex transitions but also enables the novel capability to deduce a story leading up to a specified final image.

MiniMax T2V-01

  • MiniMax T2V-01: MiniMax T2V-01 is a text-to-video model that uniquely delivers professional-level camera movement control, transforming written prompts into cinematic video clips with dynamic shots.
  • MiniMax T2V-01-Director: T2V-01-Director is a text-to-video AI model that offers precise camera control, allowing users to create professional-looking video clips with cinematic movements through a variety of lens instructions.

MiniMax I2V-01

  • MiniMax I2V-01: MiniMax I2V-01 is a foundational image-to-video model that converts static pictures into high-quality video sequences, delivering smooth animation especially optimized for illustrations and anime styles.
  • MiniMax I2V-01-Director: T2V-01-Director is a text-to-video AI model that offers precise camera control, allowing users to create professional-looking video clips with cinematic movements through a variety of lens instructions.
  • MiniMax I2V-01-Live: I2V-01-Live is an image-to-video model specifically optimized for animating 2D illustrations and cartoon styles, enhancing smoothness and vivid motion to bring static art to life with fluid character movements and natural expressions.

MiniMax S2V-01

  • MiniMax S2V-01: The MiniMax S2V-01 is a specialized subject reference video model designed to solve the industry challenge of character consistency. It can generate dynamic videos where the main character’s identity stays highly consistent across every frame, using just a single photo as a reference and at a computational cost significantly lower than traditional solutions.
Jan 31, 2026

Alibaba

Wan

  • Wan 2.6 T2V: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.6 I2V Flash: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.6 I2V: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.5 T2V Preview: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.5 supports automatic dubbing and uploading custom audio files.
  • Wan 2.5 I2V Preview: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.5 supports automatic dubbing and uploading custom audio files.
  • Wan 2.2 T2V Plus: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.2 features more accurate instruction understanding, stable and smooth motion generation, and richer details.
  • Wan 2.2 I2V Flash: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wan 2.2 Flash features ultimate generation speed, with more accurate instruction understanding and camera control, consistent visual elements, and comprehensively improved stability and success rates.
  • Wan 2.2 I2V Plus: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wan 2.2 Plus features more accurate instruction understanding, controllable camera movements, consistent visual elements, and comprehensively improved stability and success rates, delivering richer generated content.
  • Wan 2.2 KF2V Flash: The Wan First-and-Last-Frame Video Generation Model: simply provide the first and last frame images, and it can generate a smooth, fluid dynamic video based on the prompt.

Wanx

  • Wanx 2.1 T2V Turbo: Wan text-to-video model can generate videos with a single sentence, featuring rich artistic styles and cinematic quality. Wanx 2.1 Turbo offers high cost-effectiveness.
  • Wanx 2.1 T2V Plus: Wan text-to-video model can generate videos from a single sentence, featuring rich artistic styles and cinematic quality. Wanx 2.1 Plus offers even more refined visuals.
  • Wanx 2.1 I2V Plus: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wanx 2.1 Plus offers even more refined image quality.
  • Wanx 2.1 I2V Turbo: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic-quality visuals. Wanx 2.1 Turbo offers high cost-effectiveness.
  • Wanx 2.1 KF2V Plus: The Wan First-and-Last-Frame Video Generation Model: simply provide the first and last frame images, and it can generate a smooth, fluid dynamic video based on the prompt.

ByteDance

Seedream

  • Seedream 3.0 T2I: Seedream 3.0 is a Chinese-English bilingual image generation foundation model that supports native high resolution. Its overall capabilities are comparable to GPT-4o, ranking it among the world’s top tier. Faster response speed; more accurate small text generation and enhanced text typesetting effect; strong instruction-following ability, improved aesthetics & structure, and good fidelity and detail performance.
  • Seedream 4.0 T2I: A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation.
  • Seedream 4.0 I2I: A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation.
  • Seedream 4.5 T2I: Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements—especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened.
  • Seedream 4.5 I2I: Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements—especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened.

Seededit

  • Seededit 3.0 I2I: SeedEdit 3.0 is an image editing model that supports editing images via text instructions. SeedEdit 3.0 is trained based on the text-to-image model Seedream 3.0, integrated with diverse data fusion methods and specific reward models. Its ability to preserve image subjects, backgrounds, and details has been further improved, especially in scenarios such as portrait editing, background modification, perspective and light conversion.

Seedance

  • Seedance 1.0 Lite T2V: ByteDance’s small-parameter version of the video generation model achieves excellent video generation quality while significantly increasing generation speed, balancing both effect and efficiency.
  • Seedance 1.0 Lite I2V: ByteDance’s small-parameter version of the video generation model achieves excellent video generation quality while significantly increasing generation speed, balancing both effect and efficiency.
  • Seedance 1.0 Pro Fast T2V: Seedance 1.0 pro fast, inheriting the core advantages of the Seedance 1.0 pro model, has a 3x faster generation speed and a 72% lower price. It is a video generation model that achieves an excellent balance among quality, speed, and cost.
  • Seedance 1.0 Pro Fast I2V: Seedance 1.0 pro fast, inheriting the core advantages of the Seedance 1.0 pro model, has a 3x faster generation speed and a 72% lower price. It is a video generation model that achieves an excellent balance among quality, speed, and cost.
  • Seedance 1.0 Pro T2V: Seedance 1.0 is a video generation foundation model launched by ByteDance. As the large-parameter version of this model series, Seedance 1.0 Pro has unique multi-shot narrative capabilities and performs excellently across all dimensions. It has made breakthroughs in semantic understanding and instruction-following capabilities, and can generate 1080P high-definition videos that are smooth in motion, rich in details, diverse in style, and have cinematic-level aesthetics.
  • Seedance 1.0 Pro I2V: Seedance 1.0 is a video generation foundation model launched by ByteDance. As the large-parameter version of this model series, Seedance 1.0 Pro has unique multi-shot narrative capabilities and performs excellently across all dimensions. It has made breakthroughs in semantic understanding and instruction-following capabilities, and can generate 1080P high-definition videos that are smooth in motion, rich in details, diverse in style, and have cinematic-level aesthetics.
  • Seedance 1.5 Pro T2V: Seedance 1.5 pro is ByteDance’s new professional-grade audio-visual co-generation model. It builds on multi-shot narrative and HD generation capabilities, supporting integrated audio and video output for a unified creation experience (visuals, human voice, music, and sound effects). The model includes a start/end frame feature, allowing creators to lock the video’s style, composition, and characters by setting the first and last frames.
  • Seedance 1.5 Pro I2V: Seedance 1.5 pro is ByteDance’s new professional-grade audio-visual co-generation model. It builds on multi-shot narrative and HD generation capabilities, supporting integrated audio and video output for a unified creation experience (visuals, human voice, music, and sound effects). The model includes a start/end frame feature, allowing creators to lock the video’s style, composition, and characters by setting the first and last frames.
Jan 26, 2026

Alibaba

Wan

Wanx

  • Wanx 2.1 T2I Turbo: The Wan text-to-image model generates beautiful images from text. Supports multiple styles and generates quickly.
  • Wanx 2.1 T2I Plus: The Wan text-to-image model generates beautiful images from text. Supports multiple styles and generates images with rich details.
  • Wanx 2.1 Image Edit: Can achieve diverse image editing through simple instructions, suitable for scenarios such as image expansion, watermark removal, style transfer, image restoration, and image enhancement.
  • Wanx 2.0 T2I Turbo: The Wan text-to-image model excels in textured portraits and creative design, offering great value for money.
  • Wanx Style Repaint V1: Can perform various stylized redraws on input portrait images, allowing the newly generated images to maintain the original facial features while presenting different artistic painting effects.
  • Wanx Sketch to Image Lite: Based on input hand-drawn sketches and text descriptions, exquisite doodle artworks can be generated.
  • Wanx Background Generation V2: Can expand and generate background information based on input foreground image materials, achieving natural light and shadow fusion effects, as well as delicate and realistic image generation.

Qwen

  • Qwen Image Plus: The qwen-image excels in text rendering, particularly for Chinese text. Currently more cost-effective than qwen-image.
  • Qwen Image: The qwen-image excels in text rendering, particularly for Chinese text.
  • Qwen Image Edit Plus: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit Plus 2025-12-15: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit Plus 2025-10-30: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen MT Image: Supports translating text from images in 11 languages into Chinese or English, accurately preserving original layout and content information, and provides custom features such as terminology definitions, sensitive word filtering, and image subject detection.

WordArt

  • WordArt Semantic: Can creatively deform the edge contours of input text based on prompt content, achieving more creative uses of a font, and returns a black-background white mask image containing the text.
  • Wordart Texture: Can perform creative design on input text content or text images, adding materials and textures to the text based on prompt content to achieve effects such as 3D prominence or scene integration.

AI Try-On

  • AI Try-On: A virtual try-on image generation model that generates try-on images based on portrait photos and clothing images.
  • AI Try-On Plus: Compared to the AI Try-On, there are improvements in image clarity, clothing texture details, and logo restoration effects, but the generation time is longer.
  • AI Try-On Parsing V1: Supports segmentation of model images and clothing images, and can be used for pre-processing and post-processing of AI fitting room images.
  • AI Try-On Refiner: Perform secondary generation on the effect images created by AI virtual try-on, outputting finely polished virtual try-on effect images with higher fidelity.

Image Utilities

  • Image Outpainting: Allows for free image extension, supporting image rotation and expansion through both expansion coefficient and pixel count methods.