Voice Workflows

Voice workflows handle voice cloning and standalone audio narration. They are used as building blocks by video pipelines (AI Shorts, Slideshow, Reddit Stories) or run independently.

Supported TTS Models

Model	Provider	Quality	Voice Cloning	Notes
`elevenlabs/eleven_turbo_v2_5`	ElevenLabs	Best	Yes (multi-sample)	Default for `standard`/`premium` presets
`kokoro`	Local	High	No	Default for `local` preset, 6 voices
`chatterbox`	Local	High	Yes (single sample)	Zero-shot voice cloning, no API key needed
`voxtral`	Local (Mac)	High	No	Apple Silicon only (MLX)
`piper`	Local	Decent	No	Lightweight, CPU-friendly
`fal-ai/kokoro/american-english`	FAL	High	No	Remote Kokoro

Kokoro Voice Pool

Kokoro includes 6 built-in voices for multi-speaker workflows:

Voice ID	Gender	Style
`am_michael`	Male	Conversational
`af_heart`	Female	Warm, natural
`am_adam`	Male	Deep
`af_bella`	Female	Warm, expressive
`am_onyx`	Male	Narrator
`af_nova`	Female	Narrator

Voice Cloning

Workflow: voice/clone

Clones a voice from one or more audio samples. The cloned voice can then be used in any video or narration workflow via its voice_id.

CLI
REST

# Clone a voice from samples
fab-workflow voice/clone \
  --input 'sample_paths=["my_voice.mp3"]' \
  --input voice_name="Brand Voice" \
  --input persist=true

# Then use the cloned voice in video generation
fab-workflow video/ai-shorts \
  --input topic="AI trends" \
  --input voice_id="<voice_id from clone>"

curl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/clone" \
  -H "Authorization: Bearer $FABRIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "sample_paths": ["https://example.com/my_voice.mp3"],
      "voice_name": "Brand Voice",
      "persist": true
    }
  }'

Pipeline

clone_voice

Input

Parameter	Type	Default	Description
`sample_paths`	`list[str]`	required	Audio file paths or URLs for voice samples
`voice_name`	`string`	`""`	Name for the cloned voice
`persist`	`bool`	`false`	Whether to persist the voice for reuse

Output

{
  "voice_id": "abc123...",
  "voice_name": "Brand Voice",
  "provider": "elevenlabs"
}

Clone Providers

Provider	API Key Required	Samples	Quality	Notes
`elevenlabs`	Yes	Multi-sample	Best	Persistent, reusable voices
`chatterbox-local`	No	Single sample	High	Local inference, zero-shot
`fal-chatterbox`	FAL_KEY	Single sample	High	Remote, zero-shot

When provider=auto, the workflow tries ElevenLabs first (if API key set), then local Chatterbox (if installed), then FAL Chatterbox.

Narration

Workflow: voice/narrate

Generates audio narration from text or a topic. Two modes: provide narration_text directly, or provide a topic and the workflow generates a script first.

CLI
REST

# Direct text narration
fab-workflow voice/narrate \
  --input 'narration_text="Your narration text here."' \
  --input voice_style="narrator"

# From a topic (generates script first)
fab-workflow voice/narrate \
  --input topic="The future of AI" \
  --input duration_secs=90 \
  --input mood="calm and thoughtful"

# Local TTS
fab-workflow voice/narrate \
  --input 'narration_text="Hello world."' \
  --input tts_model="kokoro"

# Direct text narration
curl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/narrate" \
  -H "Authorization: Bearer $FABRIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "narration_text": "Your narration text here.",
      "voice_style": "narrator"
    }
  }'

# From a topic
curl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/narrate" \
  -H "Authorization: Bearer $FABRIC_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "topic": "The future of AI",
      "duration_secs": 90,
      "mood": "calm and thoughtful"
    }
  }'

Pipeline

prepare_narration → generate_narration_script → generate_narration_audio → collect_output

When narration_text is provided, the script generation step is skipped.

Input

Parameter	Type	Default	Description
`narration_text`	`string`	—	Text to narrate (provide this OR `topic`)
`topic`	`string`	—	Topic to generate narration from (provide this OR `narration_text`)
`duration_secs`	`int`	`60`	Target duration in seconds (topic mode)
`mood`	`string`	`"conversational"`	Tone for narration
`language`	`string`	`"en"`	Language for script and voice
`voice_style`	`string`	—	Voice style preset (e.g., `"narrator"`, `"warm"`, `"energetic-male"`)
`voice_id`	`string`	—	Explicit voice ID override (e.g., from a cloned voice)
`voice_gender`	`string`	—	`"male"` or `"female"`
`tts_model`	`string`	—	TTS model override (e.g., `"kokoro"`, `"piper"`, `"elevenlabs/eleven_turbo_v2_5"`)

Output

{
  "audio_path": "/tmp/voiceover.mp3",
  "duration": 87.4,
  "script_text": "The generated narration text...",
  "transcript": [
    {"start": 0.0, "end": 2.5, "text": "The future of AI..."}
  ]
}