AI Shorts Pipeline
The AI Shorts pipeline generates production-ready short-form video (TikTok, Reels, Shorts) from a single topic string. It orchestrates script generation, AI actor creation, voiceover, b-roll video, background music, talking heads, lip-sync, subtitles, and final composition — all as a single workflow.
Pipeline Architecture
Section titled “Pipeline Architecture” generate_script │ generate_keyframes (opt-in) │ ┌─────────┼─────────────────────┐ │ │ │ │ generate_ generate_ generate_ generate_ ai_actor voiceover broll (×3) bgm │ │ │ │ └─────────┼─────────────────────┘ │ merge_generation │ generate_talking_heads │ lipsync_talking_heads │ mix_audio │ transcribe_voiceover │ compose_timeline │ burn_subtitles │ burn_hook_overlay │ effects_pipeline │ collect_final_outputThe 6-way fork (actor, voiceover, b-roll x3, music) runs in parallel for maximum throughput. On a fast connection with remote models, a 45-second video completes in 3-5 minutes.
Quick Start
Section titled “Quick Start”# Minimal — topic onlyfabric run global/ai-shorts --input topic="Why sleep is a superpower"
# Full controlfabric run global/ai-shorts \ --input topic="AI is replacing junior developers" \ --input mood="dramatic" \ --input platform="YouTube Shorts" \ --input duration_secs=60 \ --input quality=premium \ --input use_keyframe_grid=truefrom fabric_platform import FabricClient
fabric = FabricClient()run = fabric.run_workflow("global/ai-shorts", input={ "topic": "Why sleep is a superpower", "mood": "high-energy and conversational", "platform": "TikTok", "duration_secs": 45, "quality": "premium",})result = fabric.wait_for_run(run["id"])print(result["output"]["final_video_path"])import { FabricClient } from "@fabric-platform/sdk";const fabric = new FabricClient();
const run = await fabric.workflows.runs.submitRun({ workflowSlug: "global/ai-shorts", input: { topic: "Why sleep is a superpower", mood: "high-energy and conversational", platform: "TikTok", duration_secs: 45, quality: "premium", },});const result = await fabric.workflows.runs.waitForRun(run.id);Input Parameters
Section titled “Input Parameters”| Parameter | Type | Default | Description |
|---|---|---|---|
topic | string | required | The subject of the video |
hook | string | "" | Specific hook line (auto-generated if empty) |
mood | string | "high-energy and conversational" | Emotional tone for visuals and narration |
platform | string | "TikTok" | Target platform (affects pacing and framing) |
duration_secs | int | 45 | Target video duration in seconds |
presenter_look | string | "confident young creator..." | AI actor appearance description |
visual_style | string | "" | Override visual aesthetic (e.g. “neon cyberpunk”) |
quality | string | "" | Quality preset: cheap, premium, ultra, local, local-power |
use_keyframe_grid | bool | false | Enable 2x2 grid keyframe generation |
gender | string | auto-detected | Voice gender for TTS (male or female) |
Quality Presets
Section titled “Quality Presets”Quality presets control which models are used at each pipeline stage:
| Preset | Script | TTS | Avatar | B-Roll | Lip-sync | Music |
|---|---|---|---|---|---|---|
cheap | Gemini Flash | Kokoro | Kling v2 | Veo 3.1 | FAL | Stable Audio |
premium | Gemini Flash | ElevenLabs | Kling v2 | Kling v2.5 | VEED | Stable Audio |
ultra | Gemini Flash | ElevenLabs | OmniHuman | Kling v3 (i2v) | built-in | Stable Audio |
local | Qwen3 8B | Kokoro | Wav2Lip | WAN 1.3B | Wav2Lip | MusicGen |
local-power | Qwen3 latest | Kokoro | Wav2Lip | WAN 1.3B | Wav2Lip | MusicGen |
local-light | Gemma3 4B | Piper | skip | skip | skip | skip |
Individual model keys can override any preset value. See Model Configuration for details.
Pipeline Stages
Section titled “Pipeline Stages”1. Script Generation
Section titled “1. Script Generation”An LLM generates a structured script with:
- Hook text — the attention-grabbing opening line
- Full narration — the complete voiceover script (110-160 words)
- Segments — 5-7 alternating segments of type
actor_talkingorbroll, each with timing, narration text, and a visual prompt
The script also generates a continuity brief — a text prefix encoding the video’s unified color palette, film stock, and atmospheric quality. This prefix is prepended to every downstream visual generation prompt. See Shot Design.
2. Keyframe Grid (Optional)
Section titled “2. Keyframe Grid (Optional)”When enabled, a 2x2 grid of keyframe images is generated from the b-roll segment descriptions. All 4 panels are generated in a single image, forcing visual consistency. The grid is cropped into individual keyframes that serve as reference images for image-to-video generation.
3. Parallel Generation
Section titled “3. Parallel Generation”Six tasks run concurrently:
- AI Actor — Generates a portrait image matching the
presenter_lookdescription via Imagen 4 - Voiceover — Text-to-speech of the full narration (ElevenLabs, Kokoro, or Piper)
- B-Roll (x3) — Up to 3 b-roll video clips generated with cinema-grade prompts. Routes to local models (WAN, LTX), FAL (Veo, Kling), or Ken Burns fallback
- Background Music — Mood-matched music generation (Stable Audio or MusicGen)
4. Talking Heads
Section titled “4. Talking Heads”The AI actor portrait + voiceover audio segments are combined into talking-head video clips using avatar models (Kling Avatar, OmniHuman, or SadTalker/Wav2Lip locally).
5. Lip-Sync
Section titled “5. Lip-Sync”For models that don’t include built-in lip-sync, a separate lip-sync pass aligns mouth movements to audio (VEED, MuseTalk, LatentSync, or Wav2Lip).
6. Post-Production
Section titled “6. Post-Production”- Audio Mix — Voiceover + background music mixed with configurable volume levels
- Transcription — Word-level transcription via Faster Whisper for subtitle timing
- Composition — Timeline assembly interleaving talking-head and b-roll segments
- Subtitles — Burned into video with proper timing
- Hook Overlay — Animated text overlay for the hook line (first 3 seconds)
- Effects — Optional video effects pipeline
Output
Section titled “Output”The pipeline produces a single .mp4 file at 1080x1920 (9:16 vertical) with:
- Burned-in subtitles
- Hook text overlay
- Mixed audio (voice + music)
- All segments composited in timeline order
{ "final_video_path": "/tmp/fabric_final_abc123.mp4", "script": { "topic": "...", "segments": [...] }, "voiceover_path": "/tmp/fabric_vo_xyz.mp3", "actor_image_path": "/tmp/fabric_img_def.png", "broll_path_0": "/tmp/fabric_broll_0.mp4", "broll_path_1": "/tmp/fabric_broll_1.mp4", "broll_path_2": "/tmp/fabric_broll_2.mp4"}Running Locally
Section titled “Running Locally”The pipeline works fully offline with local models. No API keys required.
# Install local dependenciespip install "mlx-video @ git+https://github.com/Blaizzy/mlx-video.git" # Macpip install diffusers torch transformers accelerate # Any platformollama pull qwen3:8b # Script generation
# Run with local profilefabric run global/ai-shorts \ --input topic="The future of AI" \ --input quality=localSee Local Video & Image Models for setup details.