Skip to main content
Feb 4, 2026

MiniMax

MiniMax Image-01

  • MiniMax Image-01 T2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.
  • MiniMax Image-01 I2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.
  • MiniMax Image-01-Live I2I: MiniMax’s multimodal vision model that blends text-to-image generation with visual reasoning for seamless cross-modal tasks.

Hailuo

  • Hailuo 2.3 T2V: Hailuo 2.3 not only generates high-quality videos from text or images with exceptional instruction following, but also redefines realism through its state-of-the-art mastery of extreme physics.
  • Hailuo 2.3 I2V: Hailuo 2.3 not only generates high-quality videos from text or images with exceptional instruction following, but also redefines realism through its state-of-the-art mastery of extreme physics.
  • Hailuo 2.3 Fast I2V: Hailuo 2.3 Fast efficiently transforms images into dynamic videos with extreme physics mastery. It delivers exceptional value by generating high-quality, realistic motion at a reduced computational cost.
  • Hailuo 02 T2V: Hailuo 02 masters both text-to-video and image-to-video generation with exceptional instruction following, while setting a new standard in visual realism through its extreme physics simulation.
  • Hailuo 02 I2V: Hailuo 02 masters both text-to-video and image-to-video generation with exceptional instruction following, while setting a new standard in visual realism through its extreme physics simulation.
  • Hailuo 02 FL2V: Hailuo 02’s FL2V function provides unprecedented creative control by generating dynamic videos between a user-defined start and end frame. This feature not only masters extreme physics and complex transitions but also enables the novel capability to deduce a story leading up to a specified final image.

MiniMax T2V-01

  • MiniMax T2V-01: MiniMax T2V-01 is a text-to-video model that uniquely delivers professional-level camera movement control, transforming written prompts into cinematic video clips with dynamic shots.
  • MiniMax T2V-01-Director: T2V-01-Director is a text-to-video AI model that offers precise camera control, allowing users to create professional-looking video clips with cinematic movements through a variety of lens instructions.

MiniMax I2V-01

  • MiniMax I2V-01: MiniMax I2V-01 is a foundational image-to-video model that converts static pictures into high-quality video sequences, delivering smooth animation especially optimized for illustrations and anime styles.
  • MiniMax I2V-01-Director: T2V-01-Director is a text-to-video AI model that offers precise camera control, allowing users to create professional-looking video clips with cinematic movements through a variety of lens instructions.
  • MiniMax I2V-01-Live: I2V-01-Live is an image-to-video model specifically optimized for animating 2D illustrations and cartoon styles, enhancing smoothness and vivid motion to bring static art to life with fluid character movements and natural expressions.

MiniMax S2V-01

  • MiniMax S2V-01: The MiniMax S2V-01 is a specialized subject reference video model designed to solve the industry challenge of character consistency. It can generate dynamic videos where the main character’s identity stays highly consistent across every frame, using just a single photo as a reference and at a computational cost significantly lower than traditional solutions.
Jan 31, 2026

Alibaba

Wan

  • Wan 2.6 T2V: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.6 I2V Flash: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.6 I2V: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.6 introduces multi-shot narrative capabilities and supports both automatic dubbing and uploading custom audio files.
  • Wan 2.5 T2V Preview: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.5 supports automatic dubbing and uploading custom audio files.
  • Wan 2.5 I2V Preview: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic quality. Wan 2.5 supports automatic dubbing and uploading custom audio files.
  • Wan 2.2 T2V Plus: The Wan text-to-video model can generate videos from a single sentence, presenting rich artistic styles and cinematic quality. Wan 2.2 features more accurate instruction understanding, stable and smooth motion generation, and richer details.
  • Wan 2.2 I2V Flash: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wan 2.2 Flash features ultimate generation speed, with more accurate instruction understanding and camera control, consistent visual elements, and comprehensively improved stability and success rates.
  • Wan 2.2 I2V Plus: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wan 2.2 Plus features more accurate instruction understanding, controllable camera movements, consistent visual elements, and comprehensively improved stability and success rates, delivering richer generated content.
  • Wan 2.2 KF2V Flash: The Wan First-and-Last-Frame Video Generation Model: simply provide the first and last frame images, and it can generate a smooth, fluid dynamic video based on the prompt.

Wanx

  • Wanx 2.1 T2V Turbo: Wan text-to-video model can generate videos with a single sentence, featuring rich artistic styles and cinematic quality. Wanx 2.1 Turbo offers high cost-effectiveness.
  • Wanx 2.1 T2V Plus: Wan text-to-video model can generate videos from a single sentence, featuring rich artistic styles and cinematic quality. Wanx 2.1 Plus offers even more refined visuals.
  • Wanx 2.1 I2V Plus: The Wan image-to-video model can generate videos using prompts and image references, presenting rich artistic styles and cinematic-quality visuals. Wanx 2.1 Plus offers even more refined image quality.
  • Wanx 2.1 I2V Turbo: The Wan image-to-video model can generate videos using prompts and image references, featuring rich artistic styles and cinematic-quality visuals. Wanx 2.1 Turbo offers high cost-effectiveness.
  • Wanx 2.1 KF2V Plus: The Wan First-and-Last-Frame Video Generation Model: simply provide the first and last frame images, and it can generate a smooth, fluid dynamic video based on the prompt.

ByteDance

Seedream

  • Seedream 3.0 T2I: Seedream 3.0 is a Chinese-English bilingual image generation foundation model that supports native high resolution. Its overall capabilities are comparable to GPT-4o, ranking it among the world’s top tier. Faster response speed; more accurate small text generation and enhanced text typesetting effect; strong instruction-following ability, improved aesthetics & structure, and good fidelity and detail performance.
  • Seedream 4.0 T2I: A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation.
  • Seedream 4.0 I2I: A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation.
  • Seedream 4.5 T2I: Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements—especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened.
  • Seedream 4.5 I2I: Seedream 4.5 is the latest in-house image generation model developed by ByteDance. Compared with Seedream 4.0, it delivers comprehensive improvements—especially in editing consistency, including better preservation of subject details, lighting, and color tone. It also enhances portrait refinement and small-text rendering. The model’s multi-image composition capabilities have been significantly strengthened.

Seededit

  • Seededit 3.0 I2I: SeedEdit 3.0 is an image editing model that supports editing images via text instructions. SeedEdit 3.0 is trained based on the text-to-image model Seedream 3.0, integrated with diverse data fusion methods and specific reward models. Its ability to preserve image subjects, backgrounds, and details has been further improved, especially in scenarios such as portrait editing, background modification, perspective and light conversion.

Seedance

  • Seedance 1.0 Lite T2V: ByteDance’s small-parameter version of the video generation model achieves excellent video generation quality while significantly increasing generation speed, balancing both effect and efficiency.
  • Seedance 1.0 Lite I2V: ByteDance’s small-parameter version of the video generation model achieves excellent video generation quality while significantly increasing generation speed, balancing both effect and efficiency.
  • Seedance 1.0 Pro Fast T2V: Seedance 1.0 pro fast, inheriting the core advantages of the Seedance 1.0 pro model, has a 3x faster generation speed and a 72% lower price. It is a video generation model that achieves an excellent balance among quality, speed, and cost.
  • Seedance 1.0 Pro Fast I2V: Seedance 1.0 pro fast, inheriting the core advantages of the Seedance 1.0 pro model, has a 3x faster generation speed and a 72% lower price. It is a video generation model that achieves an excellent balance among quality, speed, and cost.
  • Seedance 1.0 Pro T2V: Seedance 1.0 is a video generation foundation model launched by ByteDance. As the large-parameter version of this model series, Seedance 1.0 Pro has unique multi-shot narrative capabilities and performs excellently across all dimensions. It has made breakthroughs in semantic understanding and instruction-following capabilities, and can generate 1080P high-definition videos that are smooth in motion, rich in details, diverse in style, and have cinematic-level aesthetics.
  • Seedance 1.0 Pro I2V: Seedance 1.0 is a video generation foundation model launched by ByteDance. As the large-parameter version of this model series, Seedance 1.0 Pro has unique multi-shot narrative capabilities and performs excellently across all dimensions. It has made breakthroughs in semantic understanding and instruction-following capabilities, and can generate 1080P high-definition videos that are smooth in motion, rich in details, diverse in style, and have cinematic-level aesthetics.
  • Seedance 1.5 Pro T2V: Seedance 1.5 pro is ByteDance’s new professional-grade audio-visual co-generation model. It builds on multi-shot narrative and HD generation capabilities, supporting integrated audio and video output for a unified creation experience (visuals, human voice, music, and sound effects). The model includes a start/end frame feature, allowing creators to lock the video’s style, composition, and characters by setting the first and last frames.
  • Seedance 1.5 Pro I2V: Seedance 1.5 pro is ByteDance’s new professional-grade audio-visual co-generation model. It builds on multi-shot narrative and HD generation capabilities, supporting integrated audio and video output for a unified creation experience (visuals, human voice, music, and sound effects). The model includes a start/end frame feature, allowing creators to lock the video’s style, composition, and characters by setting the first and last frames.
Jan 26, 2026

Alibaba

Wan

Wanx

  • Wanx 2.1 T2I Turbo: The Wan text-to-image model generates beautiful images from text. Supports multiple styles and generates quickly.
  • Wanx 2.1 T2I Plus: The Wan text-to-image model generates beautiful images from text. Supports multiple styles and generates images with rich details.
  • Wanx 2.1 Image Edit: Can achieve diverse image editing through simple instructions, suitable for scenarios such as image expansion, watermark removal, style transfer, image restoration, and image enhancement.
  • Wanx 2.0 T2I Turbo: The Wan text-to-image model excels in textured portraits and creative design, offering great value for money.
  • Wanx Style Repaint V1: Can perform various stylized redraws on input portrait images, allowing the newly generated images to maintain the original facial features while presenting different artistic painting effects.
  • Wanx Sketch to Image Lite: Based on input hand-drawn sketches and text descriptions, exquisite doodle artworks can be generated.
  • Wanx Background Generation V2: Can expand and generate background information based on input foreground image materials, achieving natural light and shadow fusion effects, as well as delicate and realistic image generation.

Qwen

  • Qwen Image Plus: The qwen-image excels in text rendering, particularly for Chinese text. Currently more cost-effective than qwen-image.
  • Qwen Image: The qwen-image excels in text rendering, particularly for Chinese text.
  • Qwen Image Edit Plus: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit Plus 2025-12-15: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit Plus 2025-10-30: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen Image Edit: Supports precise bilingual Chinese-English text editing, color adjustment, detail enhancement, style transfer, object addition and removal, and other operations, enabling complex image and text editing.
  • Qwen MT Image: Supports translating text from images in 11 languages into Chinese or English, accurately preserving original layout and content information, and provides custom features such as terminology definitions, sensitive word filtering, and image subject detection.

WordArt

  • WordArt Semantic: Can creatively deform the edge contours of input text based on prompt content, achieving more creative uses of a font, and returns a black-background white mask image containing the text.
  • Wordart Texture: Can perform creative design on input text content or text images, adding materials and textures to the text based on prompt content to achieve effects such as 3D prominence or scene integration.

AI Try-On

  • AI Try-On: A virtual try-on image generation model that generates try-on images based on portrait photos and clothing images.
  • AI Try-On Plus: Compared to the AI Try-On, there are improvements in image clarity, clothing texture details, and logo restoration effects, but the generation time is longer.
  • AI Try-On Parsing V1: Supports segmentation of model images and clothing images, and can be used for pre-processing and post-processing of AI fitting room images.
  • AI Try-On Refiner: Perform secondary generation on the effect images created by AI virtual try-on, outputting finely polished virtual try-on effect images with higher fidelity.

Image Utilities

  • Image Outpainting: Allows for free image extension, supporting image rotation and expansion through both expansion coefficient and pixel count methods.