video/talking-head

Simple talking-head video — actor speaks a script with matched voice.

Category: video
Source: workflows/video/talking_head.py

Input Schema

Field	Type	Default	Description
`actor`	`object`	—	Actor portrait image (PNG/JPG). Auto-generated from presenter_look when omitted.
`avatar_model`	`string`	`""`	Avatar model override. Defaults to Kling Avatar v2.
`duration_secs`	`integer`	`30`	Target script duration in seconds (only used when generating a script from topic).
`language`	`string`	`"en"`	Script / TTS language code.
`presenter_look`	`string`	`""`	Actor appearance description for AI portrait generation. Ignored when actor image is provided.
`regenerate`	`object`	—	When set, this run is a regeneration. Workflows may read `direction` / `keep` / `extra_instructions` to modulate prompts; the engine persists `parent_run_id` and `parent_variant_index` as run lineage columns.
`script_formula`	`string`	`""`	Optional script narrative formula. One of: ‘reframe’, ‘youre_doing_it_wrong’, ‘validation’, ‘pattern_interrupt’, ‘listicle’. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning.
`script_text`	`string`	`""`	Explicit narration text. Skips script generation when set.
`topic`	`string`	`""`	Video topic — a script is auto-generated from this. Ignored when script_text is provided.
`variants`	`integer`	`1`	Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.
`video_path`	`string`	`""`	Path to the rendered talking-head video (populated by render_talking_head).
`voice_id`	`string`	`""`	Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection.
`voice_sample`	`object`	—	Audio sample of the desired voice (MP3/WAV). Cloned via ElevenLabs or Chatterbox so the TTS matches the actor.
`voice_style`	`string`	`""`	Descriptive voice style (e.g. ‘warm female’, ‘deep narrator’). Used when voice_id is not set.

Output Schema

Field	Type	Default	Description
`asset_id`	`object`	—	Fabric asset ID (when server access is available).
`avatar_model`	`string`	`""`	Avatar model that produced the clip.
`kind`	`object`	—	Variant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout.
`script_text`	`string`	`""`	The narration text that was spoken.
`video_path`	`string`	required	Path to the final talking-head MP4.
`voice_id`	`string`	`""`	Voice ID used for TTS.

Task Pipeline

prepare_script → resolve_actor → resolve_voice → merge_actor_voice → generate_tts → render_talking_head → finalize_output

Task	Description
`prepare_script`	Use provided script_text or generate a narration from the topic.
`resolve_actor`	Resolve actor portrait — use provided image or generate one.
`resolve_voice`	Clone voice from sample, or pass through explicit voice_id.
`merge_actor_voice`	Merge parallel actor + voice resolution branches.
`generate_tts`	Generate TTS voiceover using the resolved voice.
`render_talking_head`	Generate the talking-head video from actor portrait + voiceover audio.
`finalize_output`	Persist artifact and collect output.

Run-spec example

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: video/talking-head

# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Typed reference to a file input.
#   actor: null
#
# Avatar model override. Defaults to Kling Avatar v2.
#   avatar_model: ""
#
# Target script duration in seconds (only used when generating a script from topic).
#   duration_secs: 30
#
# Script / TTS language code.
#   language: en
#
# Actor appearance description for AI portrait generation. Ignored when actor image is provided.
#   presenter_look: ""
#
# Optional script narrative formula. One of: 'reframe', 'youre_doing_it_wrong', 'validation', 'pattern_interrupt', 'listicle'. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning.
#   script_formula: ""
#
# Explicit narration text. Skips script generation when set.
#   script_text: ""
#
# Video topic — a script is auto-generated from this. Ignored when script_text is provided.
#   topic: ""
#
# Path to the rendered talking-head video (populated by render_talking_head).
#   video_path: ""
#
# Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection.
#   voice_id: ""
#
# Typed reference to a file input.
#   voice_sample: null
#
# Descriptive voice style (e.g. 'warm female', 'deep narrator'). Used when voice_id is not set.
#   voice_style: ""
#

input: {}

Run it locally:

fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=video/talking-head' \
  -H 'Authorization: Bearer fab_xxx' \
  -H 'content-type: application/yaml' \
  --data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).

Worked examples

These checked-in run-specs exercise this workflow — good starting points to copy and tweak:

Warnings

Task merge_actor_voice has no Pydantic types — contract is opaque to consumers.
Task generate_tts has no Pydantic types — contract is opaque to consumers.