Skip to content

video/talking-head

Simple talking-head video — actor speaks a script with matched voice.

Category: video
Source: workflows/video/talking_head.py

FieldTypeDefaultDescription
actorobjectActor portrait image (PNG/JPG). Auto-generated from presenter_look when omitted.
avatar_modelstring""Avatar model override. Defaults to Kling Avatar v2.
duration_secsinteger30Target script duration in seconds (only used when generating a script from topic).
languagestring"en"Script / TTS language code.
presenter_lookstring""Actor appearance description for AI portrait generation. Ignored when actor image is provided.
regenerateobjectWhen set, this run is a regeneration. Workflows may read direction / keep / extra_instructions to modulate prompts; the engine persists parent_run_id and parent_variant_index as run lineage columns.
script_formulastring""Optional script narrative formula. One of: ‘reframe’, ‘youre_doing_it_wrong’, ‘validation’, ‘pattern_interrupt’, ‘listicle’. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning.
script_textstring""Explicit narration text. Skips script generation when set.
topicstring""Video topic — a script is auto-generated from this. Ignored when script_text is provided.
variantsinteger1Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.
video_pathstring""Path to the rendered talking-head video (populated by render_talking_head).
voice_idstring""Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection.
voice_sampleobjectAudio sample of the desired voice (MP3/WAV). Cloned via ElevenLabs or Chatterbox so the TTS matches the actor.
voice_stylestring""Descriptive voice style (e.g. ‘warm female’, ‘deep narrator’). Used when voice_id is not set.
FieldTypeDefaultDescription
asset_idobjectFabric asset ID (when server access is available).
avatar_modelstring""Avatar model that produced the clip.
kindobjectVariant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout.
script_textstring""The narration text that was spoken.
video_pathstringrequiredPath to the final talking-head MP4.
voice_idstring""Voice ID used for TTS.
prepare_script → resolve_actor → resolve_voice → merge_actor_voice → generate_tts → render_talking_head → finalize_output
TaskDescription
prepare_scriptUse provided script_text or generate a narration from the topic.
resolve_actorResolve actor portrait — use provided image or generate one.
resolve_voiceClone voice from sample, or pass through explicit voice_id.
merge_actor_voiceMerge parallel actor + voice resolution branches.
generate_ttsGenerate TTS voiceover using the resolved voice.
render_talking_headGenerate the talking-head video from actor portrait + voiceover audio.
finalize_outputPersist artifact and collect output.

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: video/talking-head
# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Typed reference to a file input.
# actor: null
#
# Avatar model override. Defaults to Kling Avatar v2.
# avatar_model: ""
#
# Target script duration in seconds (only used when generating a script from topic).
# duration_secs: 30
#
# Script / TTS language code.
# language: en
#
# Actor appearance description for AI portrait generation. Ignored when actor image is provided.
# presenter_look: ""
#
# Optional script narrative formula. One of: 'reframe', 'youre_doing_it_wrong', 'validation', 'pattern_interrupt', 'listicle'. See fabric_workflow_sdk.stages.script_formulas. Unknown values are ignored with a warning.
# script_formula: ""
#
# Explicit narration text. Skips script generation when set.
# script_text: ""
#
# Video topic — a script is auto-generated from this. Ignored when script_text is provided.
# topic: ""
#
# Path to the rendered talking-head video (populated by render_talking_head).
# video_path: ""
#
# Explicit TTS voice ID. Takes precedence over voice_sample and gender-based selection.
# voice_id: ""
#
# Typed reference to a file input.
# voice_sample: null
#
# Descriptive voice style (e.g. 'warm female', 'deep narrator'). Used when voice_id is not set.
# voice_style: ""
#
input: {}

Run it locally:

Terminal window
fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

Terminal window
curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=video/talking-head' \
-H 'Authorization: Bearer fab_xxx' \
-H 'content-type: application/yaml' \
--data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).

These checked-in run-specs exercise this workflow — good starting points to copy and tweak:

  • Task merge_actor_voice has no Pydantic types — contract is opaque to consumers.
  • Task generate_tts has no Pydantic types — contract is opaque to consumers.