Voice Workflows
Voice workflows handle voice cloning and standalone audio narration. They are used as building blocks by video pipelines (AI Shorts, Slideshow, Reddit Stories) or run independently.
Supported TTS Models
Section titled “Supported TTS Models”| Model | Provider | Quality | Voice Cloning | Notes |
|---|---|---|---|---|
elevenlabs/eleven_turbo_v2_5 | ElevenLabs | Best | Yes (multi-sample) | Default for standard/premium presets |
kokoro | Local | High | No | Default for local preset, 6 voices |
chatterbox | Local | High | Yes (single sample) | Zero-shot voice cloning, no API key needed |
voxtral | Local (Mac) | High | No | Apple Silicon only (MLX) |
piper | Local | Decent | No | Lightweight, CPU-friendly |
fal-ai/kokoro/american-english | FAL | High | No | Remote Kokoro |
Kokoro Voice Pool
Section titled “Kokoro Voice Pool”Kokoro includes 6 built-in voices for multi-speaker workflows:
| Voice ID | Gender | Style |
|---|---|---|
am_michael | Male | Conversational |
af_heart | Female | Warm, natural |
am_adam | Male | Deep |
af_bella | Female | Warm, expressive |
am_onyx | Male | Narrator |
af_nova | Female | Narrator |
Voice Cloning
Section titled “Voice Cloning”Workflow: voice/clone
Clones a voice from one or more audio samples. The cloned voice can then be used in any video or narration workflow via its voice_id.
# Clone a voice from samplesfab-workflow voice/clone \ --input 'sample_paths=["my_voice.mp3"]' \ --input voice_name="Brand Voice" \ --input persist=true
# Then use the cloned voice in video generationfab-workflow video/ai-shorts \ --input topic="AI trends" \ --input voice_id="<voice_id from clone>"curl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/clone" \ -H "Authorization: Bearer $FABRIC_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "sample_paths": ["https://example.com/my_voice.mp3"], "voice_name": "Brand Voice", "persist": true } }'Pipeline
Section titled “Pipeline”clone_voice| Parameter | Type | Default | Description |
|---|---|---|---|
sample_paths | list[str] | required | Audio file paths or URLs for voice samples |
voice_name | string | "" | Name for the cloned voice |
persist | bool | false | Whether to persist the voice for reuse |
Output
Section titled “Output”{ "voice_id": "abc123...", "voice_name": "Brand Voice", "provider": "elevenlabs"}Clone Providers
Section titled “Clone Providers”| Provider | API Key Required | Samples | Quality | Notes |
|---|---|---|---|---|
elevenlabs | Yes | Multi-sample | Best | Persistent, reusable voices |
chatterbox-local | No | Single sample | High | Local inference, zero-shot |
fal-chatterbox | FAL_KEY | Single sample | High | Remote, zero-shot |
When provider=auto, the workflow tries ElevenLabs first (if API key set), then local Chatterbox (if installed), then FAL Chatterbox.
Narration
Section titled “Narration”Workflow: voice/narrate
Generates audio narration from text or a topic. Two modes: provide narration_text directly, or provide a topic and the workflow generates a script first.
# Direct text narrationfab-workflow voice/narrate \ --input 'narration_text="Your narration text here."' \ --input voice_style="narrator"
# From a topic (generates script first)fab-workflow voice/narrate \ --input topic="The future of AI" \ --input duration_secs=90 \ --input mood="calm and thoughtful"
# Local TTSfab-workflow voice/narrate \ --input 'narration_text="Hello world."' \ --input tts_model="kokoro"# Direct text narrationcurl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/narrate" \ -H "Authorization: Bearer $FABRIC_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "narration_text": "Your narration text here.", "voice_style": "narrator" } }'
# From a topiccurl -X POST "$FABRIC_URL/v1/workflows/run?name=voice/narrate" \ -H "Authorization: Bearer $FABRIC_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "input": { "topic": "The future of AI", "duration_secs": 90, "mood": "calm and thoughtful" } }'Pipeline
Section titled “Pipeline”prepare_narration → generate_narration_script → generate_narration_audio → collect_outputWhen narration_text is provided, the script generation step is skipped.
| Parameter | Type | Default | Description |
|---|---|---|---|
narration_text | string | — | Text to narrate (provide this OR topic) |
topic | string | — | Topic to generate narration from (provide this OR narration_text) |
duration_secs | int | 60 | Target duration in seconds (topic mode) |
mood | string | "conversational" | Tone for narration |
language | string | "en" | Language for script and voice |
voice_style | string | — | Voice style preset (e.g., "narrator", "warm", "energetic-male") |
voice_id | string | — | Explicit voice ID override (e.g., from a cloned voice) |
voice_gender | string | — | "male" or "female" |
tts_model | string | — | TTS model override (e.g., "kokoro", "piper", "elevenlabs/eleven_turbo_v2_5") |
Output
Section titled “Output”{ "audio_path": "/tmp/voiceover.mp3", "duration": 87.4, "script_text": "The generated narration text...", "transcript": [ {"start": 0.0, "end": 2.5, "text": "The future of AI..."} ]}