Keyframe Grid Generation

The keyframe grid generates 4 scenes in a single 2x2 image, then crops them into individual keyframes. Because all panels share one generation pass, the model maintains consistent lighting, color, and style across them. These keyframes then serve as visual anchors for downstream image-to-video generation.

This technique is inspired by the TechHalla/Atlabs 2x2 Grid Method adapted for automated video pipelines.

Why a Grid?

The AI Shorts pipeline generates b-roll segments independently — each as a separate API call. Even with a shared continuity brief (text prefix for consistent palette/grain), each generation starts from scratch. This can produce subtle visual drift between segments.

The grid solves this by generating multiple scenes simultaneously:

┌───────────────────┬───────────────────┐
│                   │                   │
│   Hook scene      │  Problem scene    │
│   (segment 0)     │  (segment 1)     │
│                   │                   │
├───────────────────┼───────────────────┤
│                   │                   │
│   Solution scene  │  CTA scene        │
│   (segment 2)     │  (segment 3)     │
│                   │                   │
└───────────────────┴───────────────────┘

All 4 panels are forced to share the same color temperature, lighting geometry, and film texture because they exist in the same diffusion context.

How It Works

Segment selection — Up to 4 b-roll segments are selected by narrative priority: hook > solution > cta > problem > insight > demo > proof.

Grid prompt construction — A single prompt describes all 4 panels, with the continuity brief prepended:

Unified palette: warm amber with teal shadows. Consistent Kodak Vision3 500T grain.
Generate a single 2x2 grid image containing exactly 4 panels arranged in a
square layout, separated by thin white borders. Each panel shows a different
scene from the same visual universe — identical color palette, lighting
temperature, and film grain texture across all panels.

Top-left panel: Dramatic wide shot of a city skyline at golden hour...
Top-right panel: Close-up of hands typing on a keyboard...
Bottom-left panel: Person standing confidently in front of a whiteboard...
Bottom-right panel: Aerial view pulling back to reveal the full landscape...

Image generation — The grid is generated as a single 1:1 aspect ratio image using the configured image model (Imagen, SDXL Turbo, Flux Schnell, etc.).
Cropping — The grid image is split into 4 individual keyframe PNG files via PIL.
Downstream use — When image-to-video models are used for b-roll (e.g., Kling v3 i2v), the pre-generated keyframe replaces the per-segment still image. This means all b-roll videos start from visually consistent reference frames.

Enabling Keyframe Grid

Option 1: Explicit opt-in

fabric run global/ai-shorts \
  --input topic="AI productivity" \
  --input use_keyframe_grid=true

Option 2: Quality profile

Profiles with a non-skip keyframe_grid model enable it automatically:

Profile	Grid Model	Enabled?
(default)	`imagen-4.0-fast-generate-001`	Yes
`premium`	`imagen-4.0-fast-generate-001`	Yes
`local`	`sdxl-turbo`	Yes
`local-power`	`flux-schnell`	Yes
`cheap`	`skip`	No
`local-light`	`skip`	No

Option 3: Override model per-run

fabric run global/ai-shorts \
  --input topic="AI productivity" \
  --input keyframe_grid_model="flux-schnell"

Edge Cases

Scenario	Behavior
1 b-roll segment	Grid skipped — not enough panels to benefit from shared context
2 b-roll segments	Side-by-side (1x2) layout at 16:9, cropped into 2 keyframes
3 b-roll segments	2x2 grid with 4th panel as an establishing shot derived from the topic
4 b-roll segments	Standard 2x2 grid
5+ b-roll segments	Multiple grids in batches of 4, all sharing the same continuity brief
Grid generation fails	Returns empty map — downstream falls back to per-segment still generation
No i2v model	Keyframes are generated but unused by video gen; available as storyboard artifacts

Continuity Brief + Grid: How They Complement

	Continuity Brief (text)	Keyframe Grid (visual)
What it does	Prepends palette/film-stock/atmosphere text to every prompt	Generates reference images from a single shared generation pass
Consistency type	Color grading, grain texture, atmospheric feel	Pixel-level lighting geometry, character details, environment
When it helps	Always — works with any model	Most valuable with image-to-video models
Cost	Zero — pure logic, no API call	One image generation per 4 segments
Latency	None	~10-30s depending on model

Both are applied together. The grid prompt includes the continuity brief, and downstream video prompts still get the brief prepended regardless.

SDK API

The grid module is importable for use in custom workflows:

from fabric_workflow_sdk._keyframe_grid import (
    generate_keyframe_grid,
    select_grid_segments,
    build_grid_prompt,
    crop_grid_to_keyframes,
)

# Full orchestration
keyframe_map = await generate_keyframe_grid(
    input_dict,
    segments=script["segments"],
    continuity_brief=script["_continuity_brief"],
    topic="AI productivity",
)
# keyframe_map = {0: "/tmp/kf_0.png", 3: "/tmp/kf_1.png", ...}

# Or use individual functions
selected = select_grid_segments(segments, max_panels=4)
prompt = build_grid_prompt(selected, continuity_brief, topic="AI")
# Generate grid image with your preferred method...
keyframes = crop_grid_to_keyframes("/tmp/grid.png", num_panels=4)