Skip to content

Long-Form Video Pipeline

The Long-Form Video pipeline generates YouTube-ready videos from 5 to 120+ minutes. It produces chapter-based output with visual consistency, background music, subtitles, thumbnails, and YouTube metadata with chapter timestamps.

Powered by Seedance 2.0 at premium/ultra quality tiers for cinema-grade video generation with native audio and lip-sync.

route_script_input
┌─────────────┼─────────────┐
│ │ │
[topic] [synopsis] [full_script]
research research parse into
→ structure → expand acts
→ generate → score → score
│ │ │
└─────────────┼─────────────┘
estimate_cost (--dry-run stops here)
generate_anchor_keyframes (Seedance refs)
generate_scene_backgrounds (green screen)
produce_chapters (parallel × 3)
├─ voiceover (locked voice)
├─ b-roll OR green screen composite
└─ assembly
apply_scene_layers (VFX layer editing)
generate_chapter_transitions (Seedance i2v)
concatenate_all
add_background_music (speech ducking)
transcribe_and_subtitle
video_output_gate (ffprobe validation)
generate_thumbnail + youtube_metadata
collect_long_form_output
Terminal window
# Topic mode — research + script generation
fab video/long-form "The Rise and Fall of Theranos" --structure documentary --duration 20min
# Synopsis mode — expand a story idea
fab video/long-form --story "A programmer discovers a security flaw in the world's largest bank" \
--structure hero_journey --duration 15min
# Full script mode — just produce the video
fab video/long-form --story @my_script.txt --quality premium
# Topic + story angle
fab video/long-form "Chernobyl disaster" \
--story "From the perspective of a firefighter first on scene" \
--structure documentary --duration 25min
# Dry run — estimate cost without producing
fab video/long-form "SpaceX" --dry-run --quality premium --duration 10min
ModeTriggerWhat happens
TopicOnly topic providedFull pipeline: deep research, structure selection, script generation
Synopsisstory < 500 wordsResearch supplements the story idea, LLM expands into full script
Full Scriptstory > 500 wordsScript parsed into acts, video produced directly (no research)

When both topic and story are provided, topic provides the research subject and story acts as the narrative angle.

StructureDescriptionBest for
documentaryRise-fall-aftermath with cold openBiographical, institutional stories
hero_journeyMonomyth transformation arcPersonal stories, startup journeys
true_crimeMystery pacing with layered revelationsCrime, investigation, conspiracy
listicleRanked items with escalating interestTop-N lists, comparisons
problem_solutionPain, failed approaches, solutionTutorials, how-to content
explainerLayered complexity with aha momentsScience, economics, complex topics
auto (default)LLM selects best fit for your topicWhen unsure
ParameterValuesDefaultDescription
aspect_ratio16:9, 9:1616:9Landscape (YouTube) or portrait (Shorts)
broll_densitylow, medium, high, all_brollmediumPresenter vs b-roll ratio
visual_styleany stringautoVisual aesthetic direction
presenter_lookany stringnoneActor description (requires --talking-heads)
include_talking_headsflagfalseAdd AI presenter segments

B-roll density controls how much screen time goes to the presenter vs b-roll footage:

  • low — 80% presenter, 20% b-roll (talking head style)
  • medium — 50/50 balanced
  • high — 80% b-roll, 20% presenter
  • all_broll — pure b-roll narration (documentary style, no presenter)

At premium and ultra quality tiers, the pipeline uses ByteDance’s Seedance 2.0 for video generation. Key advantages:

  • Multi-reference input — up to 12 reference images for visual consistency across chapters
  • Native audio generation — synchronized audio + video in a single generation
  • Phoneme-perfect lip-sync in 8+ languages
  • Video extension — smooth transitions between chapters
  • Second-by-second timeline prompts — frame-level control within each clip

Anchor keyframes are generated before chapter production and passed as references to every Seedance call, maintaining visual DNA across a 20+ minute video.

TierB-Roll ModelAvatarLipsyncEst. Cost (10min)
freeskip (stock)skipskip$0
budgetVeo 3.1 FastKling AvatarFAL Lipsync~$2
standardVeo 3.1 FastKling AvatarFAL Lipsync~$5
premiumSeedance 2.0Seedance 2.0native~$8
ultraSeedance 2.0Seedance 2.0native~$12

Use --dry-run to see a cost breakdown before committing:

Terminal window
fab video/long-form "SpaceX" --quality premium --duration 10min --dry-run

Output shows per-component costs: TTS, b-roll clips, transitions, BGM, thumbnail.

The pipeline generates YouTube-ready metadata:

  • Chapter timestamps — derived from act boundaries, formatted for YouTube descriptions
  • SEO title — under 60 characters, optimized for search
  • Description — summary + chapter timestamps + related video suggestions
  • Tags — 15-20 SEO-optimized tags
  • Thumbnail — 1280x720 with hook text overlay

The voice ID is locked after the first chapter generates successfully. All subsequent chapters use the exact same voice, preventing drift across a multi-chapter video.

You can also provide your own voice via:

  • --voice <voice_id> — use a specific TTS voice ID
  • --voice-sample <path> — clone from an audio sample (ElevenLabs or FAL Chatterbox)

All LLM prompts are stored as external template files in prompts/long_form/. Override any template by placing a file at ~/.fabric/prompts/long_form/{name}.txt.

Available templates: script_generation, synopsis_expansion, script_parsing, retention_scoring, structure_detection, youtube_metadata, seedance_cinematic.

FieldTypeDefaultDescription
topicstringVideo topic (triggers research mode)
storystringFull script or synopsis
structureenumautoNarrative archetype
duration_targetstring15minTarget duration
qualityenumstandardCost tier (free/budget/standard/premium/ultra)
platformstringYouTubeTarget platform
moodstringautoEmotional tone
audiencestringgeneralTarget audience
research_depthint5Research depth 1-10
aspect_ratioenum16:9Video aspect ratio
include_talking_headsboolfalseInclude AI presenter
presenter_lookstringActor description
visual_stylestringVisual direction
broll_densityenummediumB-roll vs presenter ratio
voice_styleenumVoice preset
voice_idstringExplicit TTS voice ID
voice_sample_pathstringAudio sample for cloning
subtitle_stylestringSubtitle styling
intro_styleenumcold_openIntro sequence type
outro_strategyenumsubscribe_teaseCTA strategy
greenscreen_footage_urlstringGreen screen footage URL for AI background replacement
scene_layersarrayVisual elements to add: [{url, role, placement, description}]
dry_runboolfalseEstimate cost only
FieldTypeDescription
video_pathstringPath to final assembled video
scriptobjectFull structured script with acts
chapter_pathsstring[]Per-chapter video paths
thumbnail_pathstringGenerated thumbnail
youtube_metadataobjectTitle, description, chapters, tags
word_countintTotal script words
estimated_duration_minfloatEstimated duration
cost_estimateobjectPer-component cost breakdown

The long-form pipeline supports two VFX modes that activate via input parameters:

Green screen compositing — provide greenscreen_footage_url and the pipeline automatically generates per-act backgrounds from the script’s visual cues, then composites the footage onto each background with camera motion preservation. Falls back to ffmpeg chroma-key when Seedance is unavailable.

Terminal window
fab video/long-form "History of Cinema" \
--quality premium \
--greenscreen-footage-url "https://cdn.example.com/presenter.mp4"

Layer-by-layer scene editing — provide scene_layers to add elements (vehicles, weather, objects) into every chapter. Layers are applied sequentially after chapter production for highest quality.

Terminal window
fab video/long-form "Future Cities" \
--quality premium \
--scene-layers '[{"url": "https://example.com/drone.jpg", "role": "vehicle", "placement": "sky", "description": "delivery drone"}]'

See the VFX Compositing guide for full documentation, SDK examples, and cost breakdown.