A multimodal image generation model that accepts text, up to 10 reference images, and element inputs to produce 1K/2K images with precise style control and multi-reference feature extraction.
API Key authentication. Format: Bearer YOUR_API_KEY.
Note: Total count of image_list + element_list must not exceed 10 items. This model does NOT support: negative_prompt, image (single), image_reference, image_fidelity, human_fidelity, subject_image_list, scene_image, or style_image parameters. Use image_list for multiple image inputs instead.
Image description text, supports Chinese and English. Can use <<<image_1>>>, <<<image_2>>> etc. to reference images from image_list
1 - 2500"A serene landscape with mountains at sunset"
Reference image list. Each element must contain an 'image' field with image URL or Base64 data. Maximum 10 items total (images + elements combined). Supported formats: .jpg, .jpeg, .png. Maximum size per image: 10MB
10[
{ "image": "https://example.com/image1.jpg" },
{
"image": "data:image/jpeg;base64,/9j/4AAQ..."
}
]Reference element list. Each element must contain an 'element_id' field with long integer. Maximum 10 items total (images + elements combined)
10[
{ "element_id": 12345 },
{ "element_id": 67890 }
]Image resolution. Supports 1K and 2K
1k, 2k "2k"
Image aspect ratio. Supports 'auto' for intelligent aspect ratio selection
16:9, 9:16, 1:1, 4:3, 3:4, 3:2, 2:3, 21:9, auto "auto"
Number of images to generate per request
1 <= x <= 91