video/ai-shorts

AI Shorts pipeline — generate production-quality short-form video from a topic.

Category: video
Source: workflows/video/ai_shorts.py

Input Schema

Field	Type	Default	Description
`actor_image_url`	`object`	—	Public URL to a portrait image for the talking-head presenter. When provided, generates a talking-head avatar video with lip sync. When omitted, generates a faceless motion-graphics style short.
`audience`	`string`	`"general"`	Target audience persona: ‘general’, ‘teens’, ‘professionals’, ‘technical’, ‘kids’
`duration_seconds`	`integer`	`45`	Target duration in seconds (15-120)
`duration_secs`	`object`	—	Alias for duration_seconds (set by normalize_input)
`hook`	`string`	`""`	Specific hook line (auto-generated if empty)
`mood`	`string`	`""`	Direct mood override (bypasses style mapping)
`platform`	`string`	`"TikTok"`	Target platform (affects pacing and framing)
`platforms`	`string[]`	—	Platform slugs for export variants: ‘tiktok’, ‘instagram’, ‘youtube_shorts’, ‘youtube’, ‘linkedin’. When set, produces resized + metadata-enriched exports per platform in the output.
`presenter_look`	`string`	`""`	AI actor appearance description
`quality`	`string`	`"fast-local"`	Quality preset. Default ‘fast-local’ keeps the script step on remote Gemini (cheap + fast) but pushes voiceover, b-roll, and music to local backends — saves ~$0.05–$0.15 per run when local backends are installed, falls through to remote per-stage when they’re not. Other values: ‘local’ (fully local), ‘local-light’ (minimal local, skips video), ‘local-power’ (heavy local), ‘fast-local-avatar’, ‘premium-local’, ‘fast’, ‘free’, ‘budget’, ‘cheap’, ‘standard’, ‘premium’, ‘ultra’. Pass an empty string to bypass the preset and let per-stage model resolution decide each model independently.
`regenerate`	`object`	—	When set, this run is a regeneration. Workflows may read `direction` / `keep` / `extra_instructions` to modulate prompts; the engine persists `parent_run_id` and `parent_variant_index` as run lineage columns.
`style`	`string`	`"educational"`	Video style: ‘educational’, ‘promotional’, ‘storytelling’, ‘listicle’
`topic`	`string`	required	The subject of the short video
`variants`	`integer`	`1`	Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.
`visual_style`	`string`	`""`	Override visual aesthetic (e.g. ‘neon cyberpunk’)
`voice`	`string`	`"alloy"`	TTS voice style for narration (e.g. ‘narrator’, ‘warm’, ‘energetic-female’)

Output Schema

Field	Type	Default	Description
`audio_asset_id`	`object`	—	Fabric asset ID of the TTS audio
`evaluation`	`object`	—	Engagement evaluation: {overall_score: number, ready_to_publish: boolean}
`kind`	`string`	`"video"`	Card kind for consumers — always ‘video’.
`platform_exports`	`object`	—	Platform-specific video exports with metadata (when `platforms` is set)
`script`	`string`	required	The narration script text
`tags`	`string[]`	—	Hashtags/keywords
`title`	`string`	required	Generated title for the short
`video_asset_id`	`object`	—	Fabric asset ID of the generated video

Task Pipeline

normalize_input → generate_script → generate_keyframes → resolve_character_refs → merge_pre_production → reconcile_continuity → multiview_character_prep → generate_ai_actor → generate_voiceover → generate_all_broll → generate_bgm → merge_generation → generate_talking_heads → lipsync_talking_heads → mix_audio → compose_timeline → prepare_for_post_processing → burn_subtitles → burn_hook_overlay_safe → generate_effect_filters → apply_effects → add_outro_fade → ai-shorts-output → collect_final_output → export_for_platforms → evaluate_output → collect_ai_shorts_output

Task	Description
`normalize_input`	Normalize simplified input params to the pipeline’s internal keys.
`generate_script`	Generate viral script — delegates to SDK stage.
`generate_keyframes`	Generate 2x2 grid keyframes for visual consistency across b-roll segments.
`resolve_character_refs`	Normalize character_refs (URL/path/asset_id) → resolved URLs.
`merge_pre_production`	Merge parallel pre-production branches (keyframes + character refs).
`reconcile_continuity`	Vision-based continuity reconciliation (opt-in).
`multiview_character_prep`	Generate multi-view character references for stronger visual consistency.
`generate_ai_actor`	Generate AI actor portrait — delegates to SDK stage.
`generate_voiceover`	Generate TTS voiceover — delegates to SDK stage.
`generate_all_broll`	Generate all b-roll clips — delegates to SDK stage.
`generate_bgm`	Generate background music — delegates to SDK stage.
`merge_generation`	Merge parallel generation branches into a single context.
`generate_talking_heads`	Generate talking heads — delegates to SDK stage.
`lipsync_talking_heads`	Lip-sync talking heads — delegates to SDK stage.
`mix_audio`	Mix voiceover + BGM — delegates to SDK stage.
`compose_timeline`	Assemble video timeline — delegates to SDK stage.
`prepare_for_post_processing`	Map keys for post-processing — delegates to SDK stage.
`burn_subtitles`	Burn word-level subtitles — delegates to SDK stage.
`burn_hook_overlay_safe`	Burn hook text overlay — delegates to SDK stage.
`generate_effect_filters`	Use Gemini to suggest FFmpeg filters based on content type.
`apply_effects`	Apply AI-generated FFmpeg filters to the video.
`add_outro_fade`	Add a fade-to-black outro at the end of the video.
`ai-shorts-output`	Video output validation: ai-shorts-output
`collect_final_output`	Collect final output — delegates to SDK stage.
`export_for_platforms`	Conditionally resize + generate metadata for target platforms.
`evaluate_output`	Score the generated short for engagement potential.
`collect_ai_shorts_output`	Extract the AiShortsOutput fields from the pipeline result.

Run-spec example

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: video/ai-shorts

# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Public URL to a portrait image for the talking-head presenter. When provided, generates a talking-head avatar video with lip sync. When omitted, generates a faceless motion-graphics style short.
#   actor_image_url: null
#
# Target audience persona: 'general', 'teens', 'professionals', 'technical', 'kids'
#   audience: general
#
# Target duration in seconds (15-120)
# [min=15, max=120]
#   duration_seconds: 45
#
# Alias for duration_seconds (set by normalize_input)
#   duration_secs: null
#
# Specific hook line (auto-generated if empty)
#   hook: ""
#
# Direct mood override (bypasses style mapping)
#   mood: ""
#
# Target platform (affects pacing and framing)
#   platform: TikTok
#
# Platform slugs for export variants: 'tiktok', 'instagram', 'youtube_shorts', 'youtube', 'linkedin'. When set, produces resized + metadata-enriched exports per platform in the output.
#   platforms: []
#
# AI actor appearance description
#   presenter_look: ""
#
# Quality preset. Default 'fast-local' keeps the script step on remote Gemini (cheap + fast) but pushes voiceover, b-roll, and music to local backends — saves ~$0.05–$0.15 per run when local backends are installed, falls through to remote per-stage when they're not. Other values: 'local' (fully local), 'local-light' (minimal local, skips video), 'local-power' (heavy local), 'fast-local-avatar', 'premium-local', 'fast', 'free', 'budget', 'cheap', 'standard', 'premium', 'ultra'. Pass an empty string to bypass the preset and let per-stage model resolution decide each model independently.
#   quality: fast-local
#
# Video style: 'educational', 'promotional', 'storytelling', 'listicle'
#   style: educational
#
# Override visual aesthetic (e.g. 'neon cyberpunk')
#   visual_style: ""
#
# TTS voice style for narration (e.g. 'narrator', 'warm', 'energetic-female')
#   voice: alloy
#

input:
  # The subject of the short video
  topic: ""

Run it locally:

fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=video/ai-shorts' \
  -H 'Authorization: Bearer fab_xxx' \
  -H 'content-type: application/yaml' \
  --data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).

Warnings

Task generate_script has no Pydantic types — contract is opaque to consumers.
Task merge_pre_production has no Pydantic types — contract is opaque to consumers.
Task reconcile_continuity has no Pydantic types — contract is opaque to consumers.
Task generate_ai_actor has no Pydantic types — contract is opaque to consumers.
Task generate_voiceover has no Pydantic types — contract is opaque to consumers.
Task generate_all_broll has no Pydantic types — contract is opaque to consumers.
Task generate_bgm has no Pydantic types — contract is opaque to consumers.
Task merge_generation has no Pydantic types — contract is opaque to consumers.
Task generate_talking_heads has no Pydantic types — contract is opaque to consumers.
Task lipsync_talking_heads has no Pydantic types — contract is opaque to consumers.
Task mix_audio has no Pydantic types — contract is opaque to consumers.
Task prepare_for_post_processing has no Pydantic types — contract is opaque to consumers.
Task burn_subtitles has no Pydantic types — contract is opaque to consumers.
Task burn_hook_overlay_safe has no Pydantic types — contract is opaque to consumers.
Task ai-shorts-output has no Pydantic types — contract is opaque to consumers.
Task collect_final_output has no Pydantic types — contract is opaque to consumers.