Skip to content

Voice Workflows

Voice workflows handle voice cloning and standalone audio narration. They are used as building blocks by video pipelines (AI Shorts, Slideshow, Reddit Stories) or run independently.

ModelProviderQualityVoice CloningNotes
elevenlabs/eleven_turbo_v2_5ElevenLabsBestYes (multi-sample)Default for standard/premium presets
kokoroLocalHighNoDefault for local preset, 6 voices
chatterboxLocalHighYes (single sample)Zero-shot voice cloning, no API key needed
voxtralLocal (Mac)HighNoApple Silicon only (MLX)
piperLocalDecentNoLightweight, CPU-friendly
fal-ai/kokoro/american-englishFALHighNoRemote Kokoro

Kokoro includes 6 built-in voices for multi-speaker workflows:

Voice IDGenderStyle
am_michaelMaleConversational
af_heartFemaleWarm, natural
am_adamMaleDeep
af_bellaFemaleWarm, expressive
am_onyxMaleNarrator
af_novaFemaleNarrator

Workflow: voice/clone

Clones a voice from one or more audio samples. The cloned voice can then be used in any video or narration workflow via its voice_id.

Terminal window
# Clone a voice from samples
fab-workflow voice/clone \
--input 'sample_paths=["my_voice.mp3"]' \
--input voice_name="Brand Voice" \
--input persist=true
# Then use the cloned voice in video generation
fab-workflow video/ai-shorts \
--input topic="AI trends" \
--input voice_id="<voice_id from clone>"
clone_voice
ParameterTypeDefaultDescription
sample_pathslist[str]requiredAudio file paths or URLs for voice samples
voice_namestring""Name for the cloned voice
persistboolfalseWhether to persist the voice for reuse
{
"voice_id": "abc123...",
"voice_name": "Brand Voice",
"provider": "elevenlabs"
}
ProviderAPI Key RequiredSamplesQualityNotes
elevenlabsYesMulti-sampleBestPersistent, reusable voices
chatterbox-localNoSingle sampleHighLocal inference, zero-shot
fal-chatterboxFAL_KEYSingle sampleHighRemote, zero-shot

When provider=auto, the workflow tries ElevenLabs first (if API key set), then local Chatterbox (if installed), then FAL Chatterbox.


Workflow: voice/narrate

Generates audio narration from text or a topic. Two modes: provide narration_text directly, or provide a topic and the workflow generates a script first.

Terminal window
# Direct text narration
fab-workflow voice/narrate \
--input 'narration_text="Your narration text here."' \
--input voice_style="narrator"
# From a topic (generates script first)
fab-workflow voice/narrate \
--input topic="The future of AI" \
--input duration_secs=90 \
--input mood="calm and thoughtful"
# Local TTS
fab-workflow voice/narrate \
--input 'narration_text="Hello world."' \
--input tts_model="kokoro"
prepare_narration → generate_narration_script → generate_narration_audio → collect_output

When narration_text is provided, the script generation step is skipped.

ParameterTypeDefaultDescription
narration_textstringText to narrate (provide this OR topic)
topicstringTopic to generate narration from (provide this OR narration_text)
duration_secsint60Target duration in seconds (topic mode)
moodstring"conversational"Tone for narration
languagestring"en"Language for script and voice
voice_stylestringVoice style preset (e.g., "narrator", "warm", "energetic-male")
voice_idstringExplicit voice ID override (e.g., from a cloned voice)
voice_genderstring"male" or "female"
tts_modelstringTTS model override (e.g., "kokoro", "piper", "elevenlabs/eleven_turbo_v2_5")
{
"audio_path": "/tmp/voiceover.mp3",
"duration": 87.4,
"script_text": "The generated narration text...",
"transcript": [
{"start": 0.0, "end": 2.5, "text": "The future of AI..."}
]
}