Skip to main content
POST
/
wan2.1-vace-plus
/
async
curl --request POST \
  --url https://api.modellix.ai/api/v1/alibaba/wan2.1-vace-plus/async \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "function": "image_reference",
  "prompt": "A girl walking through a forest"
}
'
{
  "code": 0,
  "message": "success",
  "data": {
    "status": "pending",
    "task_id": "task-abc123",
    "model_id": "model-123",
    "get_result": {
      "method": "GET",
      "url": "https://api.modellix.ai/api/v1/tasks/task-abc123"
    }
  }
}

Authorizations

Authorization
string
header
required

API Key authentication. Format: Bearer YOUR_API_KEY.

Body

application/json
function
enum<string>
required

Video editing function to use. Determines which parameters are required and applicable: (1) image_reference: Generate video from reference images; (2) video_repainting: Apply style transfer to existing video using control conditions; (3) video_edit: Edit specific regions of video using mask; (4) video_extension: Extend video at start/end using frame/clip references; (5) video_outpainting: Expand video canvas boundaries

Available options:
image_reference,
video_repainting,
video_edit,
video_extension,
video_outpainting
Example:

"image_reference"

prompt
string
required

Content description in Chinese or English (1-800 characters). Describes the desired video content or editing effect. Required for all functions

Required string length: 1 - 800
Example:

"A girl walking through a forest"

ref_images_url
string<uri>[]

Reference image URLs array. Usage by function: (1) image_reference: 1-3 images required, used as visual references for video generation; (2) video_repainting: max 1 image optional, provides style reference; (3) video_edit: max 1 image optional, provides style reference; (4) other functions: not used. Images must be publicly accessible HTTP/HTTPS URLs

Required array length: 1 - 3 elements
Example:
[
"https://example.com/ref1.jpg",
"https://example.com/ref2.jpg"
]
obj_or_bg
enum<string>[]

Classification for each reference image: 'obj' (object/subject) or 'bg' (background). Only for image_reference function. Array length must match ref_images_url length. Maximum 1 'bg' element allowed. Helps model distinguish between subject references and background style references

Available options:
obj,
bg
Example:
["obj", "bg"]
video_url
string<uri>

Input video URL. Usage by function: (1) video_repainting: required, the video to be repainted; (2) video_edit: required, the video to be edited; (3) video_outpainting: required, the video to expand; (4) video_extension: optional, provides reference style when extending; (5) image_reference: not used. Must be publicly accessible HTTP/HTTPS/OSS URL

Example:

"https://example.com/video.mp4"

control_condition
enum<string>

Control condition type for structure preservation. Usage by function: (1) video_repainting: required, determines how video structure is preserved during repainting; (2) video_extension: required when video_url is present, maintains consistency with reference video; (3) video_edit: optional, provides structural guidance for editing; (4) other functions: not used. Options: 'posebodyface' (pose+body+face detection), 'posebody' (pose+body only), 'depth' (depth map), 'scribble' (edge detection), '' (empty string for no extraction)

Available options:
posebodyface,
posebody,
depth,
scribble,
Example:

"depth"

mask_image_url
string<uri>

Mask image URL for video_edit function. Defines the region to edit (white=edit, black=keep). Must provide either mask_image_url OR mask_video_url, not both. When using mask_image_url, also specify mask_frame_id to indicate which frame the mask corresponds to. The mask will be propagated to other frames based on mask_type setting

Example:

"https://example.com/mask.png"

mask_video_url
string<uri>

Mask video URL for video_edit function. Provides frame-by-frame mask for precise control (white=edit, black=keep). Must provide either mask_image_url OR mask_video_url, not both. Mask video must have the same frame count as the input video. Use this for complex editing that requires different masks for different frames

Example:

"https://example.com/mask.mp4"

mask_frame_id
integer
default:1

Frame ID (1-based index) indicating which frame the mask_image_url corresponds to. Only applicable when using mask_image_url in video_edit function. The mask will be propagated from this frame to others based on mask_type. For example, mask_frame_id=1 means the mask corresponds to the first frame of the video. Default is 1 (first frame)

Required range: x >= 1
Example:

1

mask_type
enum<string>
default:tracking

Mask propagation type for video_edit function. Options: (1) 'tracking': mask follows object movement across frames (recommended for moving objects); (2) 'fixed': mask stays in same position across all frames (recommended for static scenes or background edits). Default is 'tracking'

Available options:
tracking,
fixed
Example:

"tracking"

expand_ratio
number
default:0.05

Mask expansion ratio for video_edit function (0.0-1.0). Expands the mask boundary to include surrounding areas. 0.0 = no expansion (use exact mask), 1.0 = maximum expansion. Default is 0.05 (5% expansion). Useful for ensuring complete coverage of editing region and avoiding edge artifacts

Required range: 0 <= x <= 1
Example:

0.1

expand_mode
enum<string>
default:hull

Mask expansion mode for video_edit function. Determines how the mask is expanded when expand_ratio > 0. Options: (1) 'hull': convex hull expansion (smooth, rounded boundaries); (2) 'bbox': bounding box expansion (rectangular boundaries); (3) 'original': keep original mask shape while expanding. Default is 'hull'. Use 'hull' for natural objects, 'bbox' for rectangular regions

Available options:
hull,
bbox,
original
Example:

"hull"

first_frame_url
string<uri>

First frame image URL for video_extension function. Specifies the starting frame when extending video forward. At least one of first_frame_url/last_frame_url/first_clip_url/last_clip_url must be provided. Use this to define the exact starting point of the extended video. The model will generate smooth transition from this frame

Example:

"https://example.com/first_frame.jpg"

last_frame_url
string<uri>

Last frame image URL for video_extension function. Specifies the ending frame when extending video backward. At least one of first_frame_url/last_frame_url/first_clip_url/last_clip_url must be provided. Use this to define the exact ending point of the extended video. The model will generate smooth transition to this frame

Example:

"https://example.com/last_frame.jpg"

first_clip_url
string<uri>

First video clip URL for video_extension function. Provides a video segment to use as the starting portion. At least one of first_frame_url/last_frame_url/first_clip_url/last_clip_url must be provided. Use this when you want more control than a single frame can provide. The model will extend naturally from the end of this clip

Example:

"https://example.com/first_clip.mp4"

last_clip_url
string<uri>

Last video clip URL for video_extension function. Provides a video segment to use as the ending portion. At least one of first_frame_url/last_frame_url/first_clip_url/last_clip_url must be provided. Use this when you want more control than a single frame can provide. The model will extend naturally to the beginning of this clip

Example:

"https://example.com/last_clip.mp4"

top_scale
number
default:1

Top boundary expansion scale for video_outpainting function (1.0-2.0). 1.0 = no expansion (original height), 2.0 = double the top area. Default is 1.0. Use values > 1.0 to expand the canvas upward. For example, 1.5 means add 50% more canvas space above the original video. The model will generate content to fill the expanded area naturally

Required range: 1 <= x <= 2
Example:

1.5

bottom_scale
number
default:1

Bottom boundary expansion scale for video_outpainting function (1.0-2.0). 1.0 = no expansion (original height), 2.0 = double the bottom area. Default is 1.0. Use values > 1.0 to expand the canvas downward. For example, 1.5 means add 50% more canvas space below the original video. The model will generate content to fill the expanded area naturally

Required range: 1 <= x <= 2
Example:

1.5

left_scale
number
default:1

Left boundary expansion scale for video_outpainting function (1.0-2.0). 1.0 = no expansion (original width), 2.0 = double the left area. Default is 1.0. Use values > 1.0 to expand the canvas leftward. For example, 1.5 means add 50% more canvas space to the left of the original video. The model will generate content to fill the expanded area naturally

Required range: 1 <= x <= 2
Example:

1.5

right_scale
number
default:1

Right boundary expansion scale for video_outpainting function (1.0-2.0). 1.0 = no expansion (original width), 2.0 = double the right area. Default is 1.0. Use values > 1.0 to expand the canvas rightward. For example, 1.5 means add 50% more canvas space to the right of the original video. The model will generate content to fill the expanded area naturally

Required range: 1 <= x <= 2
Example:

1.5

size
enum<string>
default:1280*720

Output video resolution in widthheight format. Available options: '1280720' (16:9 landscape, HD), '7201280' (9:16 portrait, mobile-friendly), '960960' (1:1 square, social media), '1088832' (4:3 landscape), '8321088' (3:4 portrait). Default is '1280*720'. Choose based on your target platform and use case. Applicable to all functions

Available options:
1280*720,
720*1280,
960*960,
832*1088,
1088*832
Example:

"1280*720"

duration
enum<integer>
default:5

Video duration in seconds. Fixed at 5 seconds and cannot be modified. The model always generates 5-second videos regardless of this parameter value

Available options:
5
Example:

5

prompt_extend
boolean
default:true

Enable intelligent prompt rewriting and enhancement. When true (default), the model will automatically optimize and expand your prompt for better results. When false, uses your prompt exactly as provided. Recommended to keep true unless you need precise control over the exact wording. Applicable to all functions

Example:

true

seed
integer

Random seed for reproducible results (0-2147483647). Using the same seed with identical parameters will produce similar (though not pixel-perfect identical) results. Useful for A/B testing different prompts while keeping other randomness constant. If not specified, a random seed is used each time. Applicable to all functions

Required range: 0 <= x <= 2147483647
Example:

42

Response

200 - application/json

Task submitted successfully

code
integer
required

Response code, 0 indicates success

Example:

0

message
string
required

Response message

Example:

"success"

data
object
required