video/talking-head
Simple talking-head video — actor speaks a script with matched voice.
Category: video
Source: workflows/video/talking_head.py
Input Schema
Section titled “Input Schema”| Field | Type | Default | Description |
|---|---|---|---|
actor | object | — | Actor portrait image (PNG/JPG). Auto-generated from presenter_look when omitted. |
avatar_model | string | "" | Avatar model override. Defaults to Kling Avatar v2. |
duration_secs | integer | 30 | Target script duration in seconds (only used when generating a script from topic). |
language | string | "en" | Script / TTS language code. |
presenter_look | string | "" | Actor appearance description for AI portrait generation. Ignored when actor image is provided. |
regenerate | object | — | When set, this run is a regeneration. Workflows may read direction / keep / extra_instructions to modulate prompts; the engine persists parent_run_id and parent_variant_index as run lineage columns. |
script_formula | string | "" | Optional script narrative formula. One of: ‘reframe’, ‘youre_doing_it_wrong’, ‘validation’, ‘pattern_interrupt’, ‘listicle’. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning. |
script_text | string | "" | Explicit narration text. Skips script generation when set. |
topic | string | "" | Video topic — a script is auto-generated from this. Ignored when script_text is provided. |
variants | integer | 1 | Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs. |
video_path | string | "" | Path to the rendered talking-head video (populated by render_talking_head). |
voice_id | string | "" | Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection. |
voice_sample | object | — | Audio sample of the desired voice (MP3/WAV). Cloned via ElevenLabs or Chatterbox so the TTS matches the actor. |
voice_style | string | "" | Descriptive voice style (e.g. ‘warm female’, ‘deep narrator’). Used when voice_id is not set. |
Output Schema
Section titled “Output Schema”| Field | Type | Default | Description |
|---|---|---|---|
asset_id | object | — | Fabric asset ID (when server access is available). |
avatar_model | string | "" | Avatar model that produced the clip. |
kind | object | — | Variant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout. |
script_text | string | "" | The narration text that was spoken. |
video_path | string | required | Path to the final talking-head MP4. |
voice_id | string | "" | Voice ID used for TTS. |
Task Pipeline
Section titled “Task Pipeline”prepare_script → resolve_actor → resolve_voice → merge_actor_voice → generate_tts → render_talking_head → finalize_output| Task | Description |
|---|---|
prepare_script | Use provided script_text or generate a narration from the topic. |
resolve_actor | Resolve actor portrait — use provided image or generate one. |
resolve_voice | Clone voice from sample, or pass through explicit voice_id. |
merge_actor_voice | Merge parallel actor + voice resolution branches. |
generate_tts | Generate TTS voiceover using the resolved voice. |
render_talking_head | Generate the talking-head video from actor portrait + voiceover audio. |
finalize_output | Persist artifact and collect output. |
Run-spec example
Section titled “Run-spec example”Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.
workflow: video/talking-head
# Optional fields — copy any line(s) under `input:` and uncomment to set:# Typed reference to a file input.# actor: null## Avatar model override. Defaults to Kling Avatar v2.# avatar_model: ""## Target script duration in seconds (only used when generating a script from topic).# duration_secs: 30## Script / TTS language code.# language: en## Actor appearance description for AI portrait generation. Ignored when actor image is provided.# presenter_look: ""## Optional script narrative formula. One of: 'reframe', 'youre_doing_it_wrong', 'validation', 'pattern_interrupt', 'listicle'. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning.# script_formula: ""## Explicit narration text. Skips script generation when set.# script_text: ""## Video topic — a script is auto-generated from this. Ignored when script_text is provided.# topic: ""## Path to the rendered talking-head video (populated by render_talking_head).# video_path: ""## Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection.# voice_id: ""## Typed reference to a file input.# voice_sample: null## Descriptive voice style (e.g. 'warm female', 'deep narrator'). Used when voice_id is not set.# voice_style: ""#
input: {}Run it locally:
fab-workflow --from-file my-run.yamlOr submit over the wire — the same file is the request body:
curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=video/talking-head' \ -H 'Authorization: Bearer fab_xxx' \ -H 'content-type: application/yaml' \ --data-binary @my-run.yamlEvery workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).
Worked examples
Section titled “Worked examples”These checked-in run-specs exercise this workflow — good starting points to copy and tweak:
examples/run-inputs/talking-head-builder-smoke.yamlexamples/run-inputs/talking-head-builder-with-variants.yaml
Warnings
Section titled “Warnings”- Task
merge_actor_voicehas no Pydantic types — contract is opaque to consumers. - Task
generate_ttshas no Pydantic types — contract is opaque to consumers.