Transforms up to 4 reference images into a cohesive video sequence with multi-element fusion, enabling character interaction and complex visual narratives.
API Key authentication. Format: Bearer YOUR_API_KEY.
List of reference images (1-4 images). Each image can be a URL or Base64 encoded string.
1 - 4 elements[
{
"image": "https://example.com/character1.jpg"
},
{
"image": "https://example.com/character2.jpg"
}
]Video generation prompt, supports Chinese and English (1-2500 characters)
1 - 2500"Two characters walking together in a park, cinematic quality"
Negative prompt describing undesired elements (1-2500 characters)
1 - 2500"blurry, low quality, distorted, static"
Generation mode: std (standard, faster) or pro (professional, higher quality but longer processing)
std, pro "std"
Video duration in seconds (5 or 10 seconds)
5, 10 5
Video aspect ratio. Supported values: 16:9 (horizontal), 9:16 (vertical), 1:1 (square)
16:9, 9:16, 1:1 "16:9"