Skip to content

Shot Design & Visual Consistency

Fabric uses a cinematography vocabulary inspired by Seedance 2.0 Shot Design to produce cinema-grade video from AI models. Instead of vague adjectives like “cinematic” or “high quality”, prompts use specific physical parameters — focal length in millimeters, 3-layer lighting descriptions, and named film stocks — that video models can execute precisely.

This system lives in workflows/video/_shot_design.py and is reusable by any workflow that generates video prompts.

Most AI video prompts look like this:

A person walking through a city, cinematic, high quality, 4K, masterpiece

This produces generic, “AI-looking” video. The model has no physical parameters to work with — “cinematic” means nothing specific.

Fabric enriches every b-roll prompt with a 7-element formula of physical description:

Close-up of hands typing on a mechanical keyboard,
fingers drumming between keystrokes, subtle shift in posture.
85mm portrait lens, slow deliberate push-in.
Monitor glow casting blue-white light on face in dark room,
volumetric dust motes drifting through the beam —
intimate yet focused mood.
Shot on Kodak Portra 400, fine organic film grain,
individual hair strands catching backlight.
Faint hum of cooling fans in background atmosphere.

The difference in output quality is dramatic. Physical parameters give the model concrete visual targets rather than abstract aesthetic goals.

Every b-roll prompt follows this structure:

ElementExample
1. Subject + appearance”Person in white linen sitting cross-legged”
2. Action + physics”fingers resting loosely on knees with subtle breathing movement”
3. Scene/environment”on wooden deck at sunrise, mist rising from lake”
4. Lighting (3-layer)Source + behavior + color tone (see below)
5. CameraFocal length in mm + movement phrase
6. TextureFilm stock or render anchor + organic imperfections
7. Micro-action + ambient detailSubtle human movements + environmental sounds/details

Every b-roll segment has a narrative type that maps to a default cinematography preset:

TypeFocal LengthCamera MovementLightingFilm Stock
hook24mm widedolly push-indramatic sidelight, volumetric shafts, teal/amberKodak Vision3 500T
problem35mm standardhandheld drift, micro-shakeharsh overhead, deep shadows, cool blue-greyARRI ALEXA Mini LF
insight85mm portraitslow push-insoft window light, warm rim separation, golden-hourKodak Portra 400
solution50mm standardslow orbital (45 arc)balanced 3-point, god rays, warm amber35mm film grain
demo35mm standardsmooth trackingclean softbox, rim light, neutral toneOctane rendering
proof85mm portraitstatic + rack focusnatural window light, atmospheric haze, goldenFuji Pro 400H
cta24mm wideslow crane updramatic backlight, lens flare, high-contrast warmCinestill 800T

Each preset defines lighting as three components:

  1. Source — where the light comes from (e.g., “dramatic sidelight from screen-left”)
  2. Behavior — how the light interacts with the scene (e.g., “volumetric light shafts cutting through atmospheric haze”)
  3. Color Tone — the palette it creates (e.g., “high-contrast teal shadows with warm amber highlights”)

This maps directly to how cinematographers describe their lighting setups.

The system prefers light sources that exist within the scene (diegetic) over external “studio” descriptions. Diegetic sources feel more authentic and give the model concrete physical objects to render.

Instead ofUse
”cinematic lighting""grill flames casting warm flicker on skin"
"dramatic side light""laptop screen glow illuminating face in dark room"
"neon lighting""neon sign reflections on wet pavement"
"warm lighting""candle flame dancing shadows on wall”

Prompts specify two contrasting moods for richer atmosphere, rather than a single adjective:

  • “moody yet vibrant”
  • “intimate but energetic”
  • “gritty yet beautiful”
  • “chaotic yet focused”
  • “raw yet polished”

This tension creates more visually interesting output than flat single-mood descriptions.

The 7th element adds subtle movements and environmental details that prevent the static, lifeless look of AI-generated video:

Human micro-actions:

  • “fingers drumming on desk”
  • “shifting weight from one foot to the other”
  • “steam rising from coffee cup as hand reaches for it”
  • “hair caught by breeze”

Environmental ambient details:

  • “dust motes drifting through light beam”
  • “condensation sliding down glass”
  • “leaves trembling in wind”
  • “distant city sounds bleeding in”

Every preset includes intentional imperfections that prevent the “too-clean AI look”:

  • Film grain (fine, subtle, or heavy depending on stock)
  • Dust motes drifting through light shafts
  • Lens flare on highlights
  • Bokeh circles in out-of-focus areas
  • Halation on bright edges (especially Cinestill 800T)
  • Natural optical vignetting

The assemble_cinematic_prompt() function enriches a raw b-roll description in 5 steps:

Raw input: "Person looking stressed at their desk"
1. Strip banned filler words (masterpiece, 4K, 8K, ultra HD, etc.)
2. Determine segment type (hook/problem/insight/solution/demo/proof/cta)
3. Check if LLM already included camera/lighting/material terms
4. If generic → layer in shot preset (focal length + camera + lighting + film stock)
5. Prepend continuity brief

These words actively degrade AI video quality and are automatically stripped:

masterpiece, 4K, 8K, ultra HD, best quality, highly detailed, photorealistic, ultra clear, ultra realistic, hyper realistic, super detailed

validate_broll_prompt() checks that a prompt contains all three categories:

  • Camera direction — focal length or camera movement term
  • Lighting — light source, behavior, or color tone term
  • Texture/material — film stock, bokeh, or organic imperfection term

If any category is missing, the assembler fills it in from the shot preset.

The continuity brief is a 1-2 sentence prefix prepended to every visual prompt in a video to maintain consistent aesthetics across all segments.

  1. Mood mapping — The video’s mood keyword (e.g., “high-energy”, “moody”, “documentary”) maps to a film stock and base palette:
MoodFilm StockBase Palette
high-energy, energeticKodak Vision3 500Twarm amber + teal shadow accents
conversational, casualKodak Portra 400warm neutral + soft golden highlights
serious, darkARRI ALEXA Mini LFdesaturated cool + neutral midtones
moody, nightCinestill 800Tcool blue-green + neon-warm highlights
cinematic, dramaticKodak Vision3 500Trich contrast + warm highlights + cool shadows
inspirationalKodak Portra 400golden warmth + soft diffused highlights
educational, documentaryARRI ALEXA Mini LFnaturalistic palette + muted tones
  1. Dominant segment type — The most frequent b-roll segment type determines the color tone from its shot preset.

  2. Output — A string like:

Unified palette: warm amber base with teal shadow accents.
Consistent Kodak Vision3 500T grain across all shots.
Atmospheric haze present in every frame.

This brief is stored as script["_continuity_brief"] and prepended to:

  • Every b-roll generation prompt
  • Every talking-head generation prompt

The continuity brief provides textual consistency (color grading, grain, atmosphere). The keyframe grid adds visual consistency (pixel-level lighting geometry, character details). Both systems work together — the grid prompt itself includes the continuity brief.

The module is importable from any workflow:

from workflows.video._shot_design import (
assemble_cinematic_prompt,
validate_broll_prompt,
strip_filler,
get_shot_preset,
generate_continuity_brief,
SEGMENT_SHOT_LIBRARY,
)
# Enrich a plain description
prompt = assemble_cinematic_prompt(
narration="The future of renewable energy",
broll_desc="Solar panels stretching across a desert landscape",
segment_type="solution",
segment_index=2,
total_segments=5,
continuity_brief="Unified palette: golden warmth...",
)
# Check prompt quality
passed, missing = validate_broll_prompt(prompt)
if not passed:
print(f"Missing: {missing}")
# Get raw presets
preset = get_shot_preset("hook")
print(preset["focal_length"]) # "24mm wide-angle lens"