Skip to content

Keyframe Grid Generation

The keyframe grid generates 4 scenes in a single 2x2 image, then crops them into individual keyframes. Because all panels share one generation pass, the model maintains consistent lighting, color, and style across them. These keyframes then serve as visual anchors for downstream image-to-video generation.

This technique is inspired by the TechHalla/Atlabs 2x2 Grid Method adapted for automated video pipelines.

The AI Shorts pipeline generates b-roll segments independently — each as a separate API call. Even with a shared continuity brief (text prefix for consistent palette/grain), each generation starts from scratch. This can produce subtle visual drift between segments.

The grid solves this by generating multiple scenes simultaneously:

┌───────────────────┬───────────────────┐
│ │ │
│ Hook scene │ Problem scene │
│ (segment 0) │ (segment 1) │
│ │ │
├───────────────────┼───────────────────┤
│ │ │
│ Solution scene │ CTA scene │
│ (segment 2) │ (segment 3) │
│ │ │
└───────────────────┴───────────────────┘

All 4 panels are forced to share the same color temperature, lighting geometry, and film texture because they exist in the same diffusion context.

  1. Segment selection — Up to 4 b-roll segments are selected by narrative priority: hook > solution > cta > problem > insight > demo > proof.

  2. Grid prompt construction — A single prompt describes all 4 panels, with the continuity brief prepended:

    Unified palette: warm amber with teal shadows. Consistent Kodak Vision3 500T grain.
    Generate a single 2x2 grid image containing exactly 4 panels arranged in a
    square layout, separated by thin white borders. Each panel shows a different
    scene from the same visual universe — identical color palette, lighting
    temperature, and film grain texture across all panels.
    Top-left panel: Dramatic wide shot of a city skyline at golden hour...
    Top-right panel: Close-up of hands typing on a keyboard...
    Bottom-left panel: Person standing confidently in front of a whiteboard...
    Bottom-right panel: Aerial view pulling back to reveal the full landscape...
  3. Image generation — The grid is generated as a single 1:1 aspect ratio image using the configured image model (Imagen, SDXL Turbo, Flux Schnell, etc.).

  4. Cropping — The grid image is split into 4 individual keyframe PNG files via PIL.

  5. Downstream use — When image-to-video models are used for b-roll (e.g., Kling v3 i2v), the pre-generated keyframe replaces the per-segment still image. This means all b-roll videos start from visually consistent reference frames.

Terminal window
fabric run global/ai-shorts \
--input topic="AI productivity" \
--input use_keyframe_grid=true

Profiles with a non-skip keyframe_grid model enable it automatically:

ProfileGrid ModelEnabled?
(default)imagen-4.0-fast-generate-001Yes
premiumimagen-4.0-fast-generate-001Yes
localsdxl-turboYes
local-powerflux-schnellYes
cheapskipNo
local-lightskipNo
Terminal window
fabric run global/ai-shorts \
--input topic="AI productivity" \
--input keyframe_grid_model="flux-schnell"
ScenarioBehavior
1 b-roll segmentGrid skipped — not enough panels to benefit from shared context
2 b-roll segmentsSide-by-side (1x2) layout at 16:9, cropped into 2 keyframes
3 b-roll segments2x2 grid with 4th panel as an establishing shot derived from the topic
4 b-roll segmentsStandard 2x2 grid
5+ b-roll segmentsMultiple grids in batches of 4, all sharing the same continuity brief
Grid generation failsReturns empty map — downstream falls back to per-segment still generation
No i2v modelKeyframes are generated but unused by video gen; available as storyboard artifacts

Continuity Brief + Grid: How They Complement

Section titled “Continuity Brief + Grid: How They Complement”
Continuity Brief (text)Keyframe Grid (visual)
What it doesPrepends palette/film-stock/atmosphere text to every promptGenerates reference images from a single shared generation pass
Consistency typeColor grading, grain texture, atmospheric feelPixel-level lighting geometry, character details, environment
When it helpsAlways — works with any modelMost valuable with image-to-video models
CostZero — pure logic, no API callOne image generation per 4 segments
LatencyNone~10-30s depending on model

Both are applied together. The grid prompt includes the continuity brief, and downstream video prompts still get the brief prepended regardless.

The grid module is importable for use in custom workflows:

from fabric_workflow_sdk._keyframe_grid import (
generate_keyframe_grid,
select_grid_segments,
build_grid_prompt,
crop_grid_to_keyframes,
)
# Full orchestration
keyframe_map = await generate_keyframe_grid(
input_dict,
segments=script["segments"],
continuity_brief=script["_continuity_brief"],
topic="AI productivity",
)
# keyframe_map = {0: "/tmp/kf_0.png", 3: "/tmp/kf_1.png", ...}
# Or use individual functions
selected = select_grid_segments(segments, max_panels=4)
prompt = build_grid_prompt(selected, continuity_brief, topic="AI")
# Generate grid image with your preferred method...
keyframes = crop_grid_to_keyframes("/tmp/grid.png", num_panels=4)