A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation—enabling more free and controllable image creation.
API Key authentication. Format: Bearer YOUR_API_KEY.
Image description text, supports Chinese and English
1 - 600"Beautiful cherry blossom garden in spring, soft lighting"
Image size in format 'widthxheight'. Range: Total pixels 921,600 to 16,777,216, aspect ratio 1/16 to 16
"2048x2048"
Batch generation mode. 'auto': auto batch generate up to 15 images, 'disabled': generate single image only
auto, disabled "auto"
Maximum number of images for batch generation. (input images + generated images ≤ 15)
1 <= x <= 155
Prompt optimization mode. 'standard': high quality (longer time), 'fast': fast optimization (shorter time)
standard, fast "standard"