video/ai-shorts
AI Shorts pipeline — generate production-quality short-form video from a topic.
Category: video
Source: workflows/video/ai_shorts.py
Input Schema
Section titled “Input Schema”| Field | Type | Default | Description |
|---|---|---|---|
actor_image_url | object | — | Public URL to a portrait image for the talking-head presenter. When provided, generates a talking-head avatar video with lip sync. When omitted, generates a faceless motion-graphics style short. |
audience | string | "general" | Target audience persona: ‘general’, ‘teens’, ‘professionals’, ‘technical’, ‘kids’ |
duration_seconds | integer | 45 | Target duration in seconds (15-120) |
duration_secs | object | — | Alias for duration_seconds (set by normalize_input) |
hook | string | "" | Specific hook line (auto-generated if empty) |
mood | string | "" | Direct mood override (bypasses style mapping) |
platform | string | "TikTok" | Target platform (affects pacing and framing) |
platforms | string[] | — | Platform slugs for export variants: ‘tiktok’, ‘instagram’, ‘youtube_shorts’, ‘youtube’, ‘linkedin’. When set, produces resized + metadata-enriched exports per platform in the output. |
presenter_look | string | "" | AI actor appearance description |
quality | string | "fast-local" | Quality preset. Default ‘fast-local’ keeps the script step on remote Gemini (cheap + fast) but pushes voiceover, b-roll, and music to local backends — saves ~$0.05–$0.15 per run when local backends are installed, falls through to remote per-stage when they’re not. Other values: ‘local’ (fully local), ‘local-light’ (minimal local, skips video), ‘local-power’ (heavy local), ‘fast-local-avatar’, ‘premium-local’, ‘fast’, ‘free’, ‘budget’, ‘cheap’, ‘standard’, ‘premium’, ‘ultra’. Pass an empty string to bypass the preset and let per-stage model resolution decide each model independently. |
regenerate | object | — | When set, this run is a regeneration. Workflows may read direction / keep / extra_instructions to modulate prompts; the engine persists parent_run_id and parent_variant_index as run lineage columns. |
style | string | "educational" | Video style: ‘educational’, ‘promotional’, ‘storytelling’, ‘listicle’ |
topic | string | required | The subject of the short video |
variants | integer | 1 | Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs. |
visual_style | string | "" | Override visual aesthetic (e.g. ‘neon cyberpunk’) |
voice | string | "alloy" | TTS voice style for narration (e.g. ‘narrator’, ‘warm’, ‘energetic-female’) |
Output Schema
Section titled “Output Schema”| Field | Type | Default | Description |
|---|---|---|---|
audio_asset_id | object | — | Fabric asset ID of the TTS audio |
evaluation | object | — | Engagement evaluation: {overall_score: number, ready_to_publish: boolean} |
kind | string | "video" | Card kind for consumers — always ‘video’. |
platform_exports | object | — | Platform-specific video exports with metadata (when platforms is set) |
script | string | required | The narration script text |
tags | string[] | — | Hashtags/keywords |
title | string | required | Generated title for the short |
video_asset_id | object | — | Fabric asset ID of the generated video |
Task Pipeline
Section titled “Task Pipeline”normalize_input → generate_script → generate_keyframes → resolve_character_refs → merge_pre_production → reconcile_continuity → multiview_character_prep → generate_ai_actor → generate_voiceover → generate_all_broll → generate_bgm → merge_generation → generate_talking_heads → lipsync_talking_heads → mix_audio → compose_timeline → prepare_for_post_processing → burn_subtitles → burn_hook_overlay_safe → generate_effect_filters → apply_effects → add_outro_fade → ai-shorts-output → collect_final_output → export_for_platforms → evaluate_output → collect_ai_shorts_output| Task | Description |
|---|---|
normalize_input | Normalize simplified input params to the pipeline’s internal keys. |
generate_script | Generate viral script — delegates to SDK stage. |
generate_keyframes | Generate 2x2 grid keyframes for visual consistency across b-roll segments. |
resolve_character_refs | Normalize character_refs (URL/path/asset_id) → resolved URLs. |
merge_pre_production | Merge parallel pre-production branches (keyframes + character refs). |
reconcile_continuity | Vision-based continuity reconciliation (opt-in). |
multiview_character_prep | Generate multi-view character references for stronger visual consistency. |
generate_ai_actor | Generate AI actor portrait — delegates to SDK stage. |
generate_voiceover | Generate TTS voiceover — delegates to SDK stage. |
generate_all_broll | Generate all b-roll clips — delegates to SDK stage. |
generate_bgm | Generate background music — delegates to SDK stage. |
merge_generation | Merge parallel generation branches into a single context. |
generate_talking_heads | Generate talking heads — delegates to SDK stage. |
lipsync_talking_heads | Lip-sync talking heads — delegates to SDK stage. |
mix_audio | Mix voiceover + BGM — delegates to SDK stage. |
compose_timeline | Assemble video timeline — delegates to SDK stage. |
prepare_for_post_processing | Map keys for post-processing — delegates to SDK stage. |
burn_subtitles | Burn word-level subtitles — delegates to SDK stage. |
burn_hook_overlay_safe | Burn hook text overlay — delegates to SDK stage. |
generate_effect_filters | Use Gemini to suggest FFmpeg filters based on content type. |
apply_effects | Apply AI-generated FFmpeg filters to the video. |
add_outro_fade | Add a fade-to-black outro at the end of the video. |
ai-shorts-output | Video output validation: ai-shorts-output |
collect_final_output | Collect final output — delegates to SDK stage. |
export_for_platforms | Conditionally resize + generate metadata for target platforms. |
evaluate_output | Score the generated short for engagement potential. |
collect_ai_shorts_output | Extract the AiShortsOutput fields from the pipeline result. |
Run-spec example
Section titled “Run-spec example”Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.
workflow: video/ai-shorts
# Optional fields — copy any line(s) under `input:` and uncomment to set:# Public URL to a portrait image for the talking-head presenter. When provided, generates a talking-head avatar video with lip sync. When omitted, generates a faceless motion-graphics style short.# actor_image_url: null## Target audience persona: 'general', 'teens', 'professionals', 'technical', 'kids'# audience: general## Target duration in seconds (15-120)# [min=15, max=120]# duration_seconds: 45## Alias for duration_seconds (set by normalize_input)# duration_secs: null## Specific hook line (auto-generated if empty)# hook: ""## Direct mood override (bypasses style mapping)# mood: ""## Target platform (affects pacing and framing)# platform: TikTok## Platform slugs for export variants: 'tiktok', 'instagram', 'youtube_shorts', 'youtube', 'linkedin'. When set, produces resized + metadata-enriched exports per platform in the output.# platforms: []## AI actor appearance description# presenter_look: ""## Quality preset. Default 'fast-local' keeps the script step on remote Gemini (cheap + fast) but pushes voiceover, b-roll, and music to local backends — saves ~$0.05–$0.15 per run when local backends are installed, falls through to remote per-stage when they're not. Other values: 'local' (fully local), 'local-light' (minimal local, skips video), 'local-power' (heavy local), 'fast-local-avatar', 'premium-local', 'fast', 'free', 'budget', 'cheap', 'standard', 'premium', 'ultra'. Pass an empty string to bypass the preset and let per-stage model resolution decide each model independently.# quality: fast-local## Video style: 'educational', 'promotional', 'storytelling', 'listicle'# style: educational## Override visual aesthetic (e.g. 'neon cyberpunk')# visual_style: ""## TTS voice style for narration (e.g. 'narrator', 'warm', 'energetic-female')# voice: alloy#
input: # The subject of the short video topic: ""Run it locally:
fab-workflow --from-file my-run.yamlOr submit over the wire — the same file is the request body:
curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=video/ai-shorts' \ -H 'Authorization: Bearer fab_xxx' \ -H 'content-type: application/yaml' \ --data-binary @my-run.yamlEvery workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).
Warnings
Section titled “Warnings”- Task
generate_scripthas no Pydantic types — contract is opaque to consumers. - Task
merge_pre_productionhas no Pydantic types — contract is opaque to consumers. - Task
reconcile_continuityhas no Pydantic types — contract is opaque to consumers. - Task
generate_ai_actorhas no Pydantic types — contract is opaque to consumers. - Task
generate_voiceoverhas no Pydantic types — contract is opaque to consumers. - Task
generate_all_brollhas no Pydantic types — contract is opaque to consumers. - Task
generate_bgmhas no Pydantic types — contract is opaque to consumers. - Task
merge_generationhas no Pydantic types — contract is opaque to consumers. - Task
generate_talking_headshas no Pydantic types — contract is opaque to consumers. - Task
lipsync_talking_headshas no Pydantic types — contract is opaque to consumers. - Task
mix_audiohas no Pydantic types — contract is opaque to consumers. - Task
prepare_for_post_processinghas no Pydantic types — contract is opaque to consumers. - Task
burn_subtitleshas no Pydantic types — contract is opaque to consumers. - Task
burn_hook_overlay_safehas no Pydantic types — contract is opaque to consumers. - Task
ai-shorts-outputhas no Pydantic types — contract is opaque to consumers. - Task
collect_final_outputhas no Pydantic types — contract is opaque to consumers.