Long-Form Video Pipeline

The Long-Form Video pipeline generates YouTube-ready videos from 5 to 120+ minutes. It produces chapter-based output with visual consistency, background music, subtitles, thumbnails, and YouTube metadata with chapter timestamps.

Powered by Seedance 2.0 at premium/ultra quality tiers for cinema-grade video generation with native audio and lip-sync.

Pipeline Architecture

                    route_script_input
                          │
            ┌─────────────┼─────────────┐
            │             │             │
        [topic]      [synopsis]    [full_script]
         research      research      parse into
         → structure   → expand      acts
         → generate    → score       → score
            │             │             │
            └─────────────┼─────────────┘
                          │
                    estimate_cost  (--dry-run stops here)
                          │
                generate_anchor_keyframes  (Seedance refs)
                          │
              generate_scene_backgrounds  (green screen)
                          │
                  produce_chapters (parallel × 3)
                    ├─ voiceover (locked voice)
                    ├─ b-roll OR green screen composite
                    └─ assembly
                          │
                  apply_scene_layers  (VFX layer editing)
                          │
              generate_chapter_transitions (Seedance i2v)
                          │
                    concatenate_all
                          │
                  add_background_music (speech ducking)
                          │
                transcribe_and_subtitle
                          │
                  video_output_gate  (ffprobe validation)
                          │
              generate_thumbnail + youtube_metadata
                          │
                  collect_long_form_output

Quick Start

# Topic mode — research + script generation
fab video/long-form "The Rise and Fall of Theranos" --structure documentary --duration 20min

# Synopsis mode — expand a story idea
fab video/long-form --story "A programmer discovers a security flaw in the world's largest bank" \
  --structure hero_journey --duration 15min

# Full script mode — just produce the video
fab video/long-form --story @my_script.txt --quality premium

# Topic + story angle
fab video/long-form "Chernobyl disaster" \
  --story "From the perspective of a firefighter first on scene" \
  --structure documentary --duration 25min

# Dry run — estimate cost without producing
fab video/long-form "SpaceX" --dry-run --quality premium --duration 10min

from fabric_platform import FabricClient

fabric = FabricClient()
run = fabric.run_workflow("video/long-form", input={
    "topic": "The Rise and Fall of Theranos",
    "structure": "documentary",
    "duration_target": "20min",
    "quality": "premium",
    "mood": "dramatic",
})
result = fabric.wait_for_run(run["id"])
print(result["output"]["video_path"])
print(result["output"]["youtube_metadata"]["youtube_title"])

import { FabricClient } from "@fabric-platform/sdk";
const fabric = new FabricClient();

const run = await fabric.workflows.runs.submit("video/long-form", {
  topic: "The Rise and Fall of Theranos",
  structure: "documentary",
  duration_target: "20min",
  quality: "premium",
});
const result = await fabric.workflows.runs.waitForRun(run.id);
console.log(result.output.video_path);

Three Input Modes

Mode	Trigger	What happens
Topic	Only `topic` provided	Full pipeline: deep research, structure selection, script generation
Synopsis	`story` < 500 words	Research supplements the story idea, LLM expands into full script
Full Script	`story` > 500 words	Script parsed into acts, video produced directly (no research)

When both topic and story are provided, topic provides the research subject and story acts as the narrative angle.

Script Structures

Structure	Description	Best for
`documentary`	Rise-fall-aftermath with cold open	Biographical, institutional stories
`hero_journey`	Monomyth transformation arc	Personal stories, startup journeys
`true_crime`	Mystery pacing with layered revelations	Crime, investigation, conspiracy
`listicle`	Ranked items with escalating interest	Top-N lists, comparisons
`problem_solution`	Pain, failed approaches, solution	Tutorials, how-to content
`explainer`	Layered complexity with aha moments	Science, economics, complex topics
`auto` (default)	LLM selects best fit for your topic	When unsure

Visual Controls

Parameter	Values	Default	Description
`aspect_ratio`	`16:9`, `9:16`	`16:9`	Landscape (YouTube) or portrait (Shorts)
`broll_density`	`low`, `medium`, `high`, `all_broll`	`medium`	Presenter vs b-roll ratio
`visual_style`	any string	auto	Visual aesthetic direction
`presenter_look`	any string	none	Actor description (requires `--talking-heads`)
`include_talking_heads`	flag	false	Add AI presenter segments

B-roll density controls how much screen time goes to the presenter vs b-roll footage:

low — 80% presenter, 20% b-roll (talking head style)
medium — 50/50 balanced
high — 80% b-roll, 20% presenter
all_broll — pure b-roll narration (documentary style, no presenter)

Seedance 2.0 Provider

At premium and ultra quality tiers, the pipeline uses ByteDance’s Seedance 2.0 for video generation. Key advantages:

Multi-reference input — up to 12 reference images for visual consistency across chapters
Native audio generation — synchronized audio + video in a single generation
Phoneme-perfect lip-sync in 8+ languages
Video extension — smooth transitions between chapters
Second-by-second timeline prompts — frame-level control within each clip

Anchor keyframes are generated before chapter production and passed as references to every Seedance call, maintaining visual DNA across a 20+ minute video.

Quality Presets

Tier	B-Roll Model	Avatar	Lipsync	Est. Cost (10min)
`free`	skip (stock)	skip	skip	$0
`budget`	Veo 3.1 Fast	Kling Avatar	FAL Lipsync	~$2
`standard`	Veo 3.1 Fast	Kling Avatar	FAL Lipsync	~$5
`premium`	Seedance 2.0	Seedance 2.0	native	~$8
`ultra`	Seedance 2.0	Seedance 2.0	native	~$12

Cost Estimation

Use --dry-run to see a cost breakdown before committing:

fab video/long-form "SpaceX" --quality premium --duration 10min --dry-run

Output shows per-component costs: TTS, b-roll clips, transitions, BGM, thumbnail.

YouTube Integration

The pipeline generates YouTube-ready metadata:

Chapter timestamps — derived from act boundaries, formatted for YouTube descriptions
SEO title — under 60 characters, optimized for search
Description — summary + chapter timestamps + related video suggestions
Tags — 15-20 SEO-optimized tags
Thumbnail — 1280x720 with hook text overlay

Voice Continuity

The voice ID is locked after the first chapter generates successfully. All subsequent chapters use the exact same voice, preventing drift across a multi-chapter video.

You can also provide your own voice via:

--voice <voice_id> — use a specific TTS voice ID
--voice-sample <path> — clone from an audio sample (ElevenLabs or FAL Chatterbox)

Prompt Customization

All LLM prompts are stored as external template files in prompts/long_form/. Override any template by placing a file at ~/.fabric/prompts/long_form/{name}.txt.

Available templates: script_generation, synopsis_expansion, script_parsing, retention_scoring, structure_detection, youtube_metadata, seedance_cinematic.

Input Reference

Field	Type	Default	Description
`topic`	string	—	Video topic (triggers research mode)
`story`	string	—	Full script or synopsis
`structure`	enum	`auto`	Narrative archetype
`duration_target`	string	`15min`	Target duration
`quality`	enum	`standard`	Cost tier (free/budget/standard/premium/ultra)
`platform`	string	`YouTube`	Target platform
`mood`	string	auto	Emotional tone
`audience`	string	`general`	Target audience
`research_depth`	int	5	Research depth 1-10
`aspect_ratio`	enum	`16:9`	Video aspect ratio
`include_talking_heads`	bool	false	Include AI presenter
`presenter_look`	string	—	Actor description
`visual_style`	string	—	Visual direction
`broll_density`	enum	`medium`	B-roll vs presenter ratio
`voice_style`	enum	—	Voice preset
`voice_id`	string	—	Explicit TTS voice ID
`voice_sample_path`	string	—	Audio sample for cloning
`subtitle_style`	string	—	Subtitle styling
`intro_style`	enum	`cold_open`	Intro sequence type
`outro_strategy`	enum	`subscribe_tease`	CTA strategy
`greenscreen_footage_url`	string	—	Green screen footage URL for AI background replacement
`scene_layers`	array	—	Visual elements to add: `[{url, role, placement, description}]`
`dry_run`	bool	false	Estimate cost only

Output Reference

Field	Type	Description
`video_path`	string	Path to final assembled video
`script`	object	Full structured script with acts
`chapter_paths`	string[]	Per-chapter video paths
`thumbnail_path`	string	Generated thumbnail
`youtube_metadata`	object	Title, description, chapters, tags
`word_count`	int	Total script words
`estimated_duration_min`	float	Estimated duration
`cost_estimate`	object	Per-component cost breakdown

VFX Compositing

The long-form pipeline supports two VFX modes that activate via input parameters:

Green screen compositing — provide greenscreen_footage_url and the pipeline automatically generates per-act backgrounds from the script’s visual cues, then composites the footage onto each background with camera motion preservation. Falls back to ffmpeg chroma-key when Seedance is unavailable.

fab video/long-form "History of Cinema" \
  --quality premium \
  --greenscreen-footage-url "https://cdn.example.com/presenter.mp4"

Layer-by-layer scene editing — provide scene_layers to add elements (vehicles, weather, objects) into every chapter. Layers are applied sequentially after chapter production for highest quality.

fab video/long-form "Future Cities" \
  --quality premium \
  --scene-layers '[{"url": "https://example.com/drone.jpg", "role": "vehicle", "placement": "sky", "description": "delivery drone"}]'

See the VFX Compositing guide for full documentation, SDK examples, and cost breakdown.