A SOTA-level multimodal image creation model based on a leading architecture. It breaks the creative boundaries of traditional text-to-image models and natively supports text, single-image, and multi-image inputs. Users can freely fuse text and images, and in the same model, realize diverse applications like multi-image fusion creation based on subject consistency, image editing, and group image generation—enabling more free and controllable image creation.
API Key authentication. Format: Bearer YOUR_API_KEY.
1 - 600Input image array (1-14 images)
1 - 14 elementsauto, disabled 1 <= x <= 15standard, fast