Model Configuration

Fabric uses a centralized model configuration system that determines which AI model handles each operation. This applies across all workflows, not just video generation.

Resolution Order

Models are resolved with a 7-level priority chain. The first match wins:

Per-run input override — --input image_model="gemini-2.5-flash-image"
Environment variable — FABRIC_IMAGE_MODEL=gemini-2.5-flash-image
Project config — ./models.yaml (walks up from cwd)
Global config — ~/.fabric/models.yaml
Explicit quality= profile — --input quality=cheap (caller-set)
Implicit prefer-local probe — when no other quality is set, Fabric probes for installed local backends and returns a local model if one exists for the operation. Disabled with FABRIC_PREFER_LOCAL=0. See Local-first by default.
Built-in defaults

Local-first by default

Fabric’s product policy is “always prefer local when possible” — the fewer remote AI calls per run, the lower the spend. Three layers cooperate to make this work without callers having to opt in:

Layer	What	Where to override
SDK resolver	When no `quality` is set, `get_model` consults `_local_availability.probe_local_backends()` and returns a local model when one exists for the operation.	`FABRIC_PREFER_LOCAL=0` to opt out (defaults to on)
Router strategy	The `fabric serve` provider router defaults its routing strategy to `LocalFirst`, which gives `Tier::Basic` (local) providers a flat scoring bonus over remote ones.	`FABRIC_ROUTING_STRATEGY=cheapest_qualified` (or `fastest`, `best_quality`, `balanced`)
Diffusers provider	A router-visible provider proxies image and video gen to the in-tree Python model server (`fabric_workflow_sdk.model_server`, default `:8199`) so `LocalFirst` actually has something to favour for image/video modalities. On by default.	`--diffusers-disabled` or `FABRIC_DIFFUSERS_DISABLED=1`

Runtime fallback

Resolution-time fallback only handles “backend not installed before we tried.” Real production failures look fine at boot and only break under load — Diffusers OOM on a particular prompt, model server crashes mid-request, provider returns 5xx.

generate_image and generate_video are wrapped with a runtime walker that, on any exception from the primary attempt, walks _FALLBACK_CHAINS[operation] (local → remote ordering) and retries with each subsequent reachable model until one succeeds. Pinned-model callers (generate_image(input, prompt, model="X")) get a single attempt — production benchmarks that compare specific models depend on no silent retry.

When fallback engages you’ll see this in the logs:

fabric.runtime_fallback: image: model sdxl-turbo failed (model server returned 500), trying next in chain
fabric.runtime_fallback: image: fallback engaged sdxl-turbo → flux-schnell after error: model server returned 500

If every model in a chain fails, an AllAttemptsFailed error surfaces with the full attempt history.

Where the model lists live

Two embedded JSON files drive the local-first behaviour. Editing them is a JSON edit + rebuild — no code changes needed.

File	What	Loaded by
`sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.json` → `fallback_chains` section	Per-operation chain (`broll`, `video`, `avatar`, `tts`, `lipsync`, `image`, `image_fast`). Each list is local-first → remote-fallback.	`_models._FALLBACK_CHAINS` (Python) — bundled with the SDK package
`crates/fabric-providers/src/diffusers_models.json`	Image + video models the Diffusers router provider advertises. Mirrors the SDK’s `_LOCAL_IMAGE_MAP` / `_LOCAL_VIDEO_MAP` keys.	`default_image_models()` / `default_video_models()` in `fabric_providers::diffusers` (Rust) — embedded via `include_str!`

When you add a new local model that needs both router-side capability advertising and SDK-side fallback, mirror the entry in both files.

Operations

Each pipeline stage maps to a named operation:

Operation	Default Model	Description
`text`	`gemini-2.5-flash`	Script generation, LLM calls
`image`	`gemini-3.1-flash-image-preview`	Image generation
`image_fast`	`imagen-4.0-fast-generate-001`	Fast image generation (thumbnails, stills)
`video`	`veo-2.0-generate-001`	Video generation
`broll`	`fal-ai/veo3.1/fast`	B-roll video generation
`broll_i2v`	`fal-ai/kling-video/v3/pro/image-to-video`	B-roll image-to-video
`keyframe_grid`	`imagen-4.0-fast-generate-001`	Keyframe grid generation
`tts`	`elevenlabs/eleven_multilingual_v2`	Text-to-speech
`avatar`	`fal-ai/kling-video/ai-avatar/v2/standard`	AI avatar / talking head
`lipsync`	`veed/lipsync`	Lip synchronization
`music`	`fal-ai/stable-audio`	Background music
`transcription`	`faster-whisper/large-v3`	Audio transcription
`thumbnail`	`gemini-3.1-flash-image-preview`	Thumbnail generation

Config File

Create models.yaml in your project root or at ~/.fabric/models.yaml:

# Override individual operations
text: gemini-2.5-flash
image: gemini-3.1-flash-image-preview
tts: elevenlabs/eleven_multilingual_v2
broll: fal-ai/veo3.1/fast
keyframe_grid: imagen-4.0-fast-generate-001

# Define custom quality profiles
profiles:
  my-custom:
    tts: fal-ai/kokoro/american-english
    broll: wan:1.3b
    keyframe_grid: sdxl-turbo

Project-level config overrides global config. Top-level keys override profile keys.

Quality Profiles

Profiles bundle model selections for common use cases:

Budget-friendly with acceptable quality. Uses faster/cheaper model variants.

Operation	Model
`tts`	`fal-ai/kokoro/american-english`
`lipsync`	`fal-ai/lipsync`
`keyframe_grid`	`skip`

All other operations use defaults.

Local Profiles

Full local pipeline with good quality. Requires 8+ GB VRAM.

Operation	Model
`text`	`qwen3:8b` (Ollama)
`image`	`sdxl-turbo`
`video`	`wan:1.3b`
`broll`	`wan:1.3b`
`tts`	`kokoro`
`avatar`	`wav2lip`
`lipsync`	`wav2lip`
`music`	`musicgen-small`
`keyframe_grid`	`sdxl-turbo`
`broll_i2v`	`skip`

Best local quality. Requires more compute.

Operation	Model
`text`	`qwen3:latest` (Ollama)
`image`	`flux-schnell`
`video`	`wan:1.3b`
`broll`	`wan:1.3b`
`tts`	`voxtral`
`avatar`	`wav2lip`
`lipsync`	`wav2lip`
`music`	`musicgen-small`
`keyframe_grid`	`flux-schnell`

Minimal resource usage. Skips heavy operations.

Operation	Model
`text`	`gemma3:4b` (Ollama)
`image`	`sdxl-turbo`
`video`	`skip`
`broll`	`skip`
`tts`	`piper`
`avatar`	`skip`
`lipsync`	`skip`
`music`	`skip`
`keyframe_grid`	`skip`

Skip Behavior

Setting any operation to "skip" disables it entirely. The pipeline gracefully handles skipped operations — for example, skipping broll means no b-roll videos are generated, and the final composition uses only talking-head segments.

Environment Variables

Every operation maps to a FABRIC_<OPERATION>_MODEL environment variable:

export FABRIC_TEXT_MODEL=gemini-2.5-flash
export FABRIC_IMAGE_MODEL=sdxl-turbo
export FABRIC_TTS_MODEL=kokoro
export FABRIC_BROLL_MODEL=wan:1.3b
export FABRIC_KEYFRAME_GRID_MODEL=imagen-4.0-fast-generate-001

Pinning hot paths to remote

The implicit prefer-local probe is the right default for most operations — it keeps spend close to zero whenever a usable local backend is installed. But when one specific operation has a much faster remote alternative (image generation on MPS being the canonical example: ~7 s/thumbnail with local SDXL vs ~1–2 s with Gemini), pinning just that operation gets you remote speed without giving up local defaults for everything else.

# Pin image-heavy ops to Gemini; text / tts / vision still follow local-first.
export FABRIC_THUMBNAIL_MODEL=gemini-3.1-flash-image-preview
export FABRIC_IMAGE_MODEL=gemini-3.1-flash-image-preview

The FABRIC_<OP>_MODEL pin sits at priority 2 in the resolution chain, so it beats both the prefer-local probe and any quality profile. Per-run --input image_model=... still wins over the env pin, so callers can override on a single run if needed.

Inspecting what’s active

If you’re staring at a slow run and want to know which model would actually be picked for which workflow, run:

just model-status

That generates a self-contained HTML page (target/model-status.html) and opens it in your browser. The page statically scans every workflow + SDK stage for get_model(input, "X") calls and renders three tables:

Operation resolution — built-in default per op, the effective model after running the SDK’s full resolution chain (FABRIC_<OP>_MODEL pin → local-availability probe → built-in default), the local candidate the probe sees, and a local/remote/skip badge.
Workflows → operations used — which ops each workflow transitively reaches and the resolved model per op. Filterable by name or operation, so you can quickly answer “is image/generate going to hit Gemini or local SDXL right now?”
Active env pins — every FABRIC_*_MODEL and the resolved FABRIC_PREFER_LOCAL state, with a count of probe-reachable local backends.

The page reflects the environment of whoever ran just model-status, so source the same .env your server uses (the recipe does this for you when one exists at the repo root).

Editing assignments interactively

For change rather than just inspection:

just model-status-tui

A Textual app that shows the same operations table inline and lets you edit each row in place. Press e (or Enter) on the highlighted op to open an edit modal — a quick-pick UI with autocomplete:

Curated suggestions per op — populated from the op’s built-in default, the local-probe candidate, every named profile that defines this op (cheap, local, premium, …), the SDK’s fallback chain, and sibling-modality defaults (e.g. image_fast is suggested for thumbnail). Each row shows the model id, its local/remote/skip class, and the source label so you can see why a given model is being suggested.
Type-to-filter — the search box at the top of the modal does substring matching across the model id, source label, and class. Typing local shows only the local-class suggestions; typing gemini narrows to Gemini variants. Enter on the search field commits the top match; arrows move into the list for keyboard browsing.
Custom… fallback — when the right model isn’t in the list, pick “Custom…” to drop into a free-form input.
Two save buttons, no radio — the write target is an explicit choice between two clearly-labeled buttons:
- Save → .env (pin) — writes FABRIC_<OP>_MODEL=<value> to .env. Server-local, reversible, and updated in place if the line already exists. Picked up by the next server restart.
- Save → models.json (default) — mutates sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.json so the change ships to every SDK consumer after rebuild. Treat with more care.
Pressing Enter on a suggestion (or in the search field) accepts the model choice and moves focus to the .env button — the safer / reversible target. Press Enter again to commit, or Shift+Tab once + Enter to commit to models.json instead. There’s no way to write to a target you didn’t explicitly pick.

c clears an active pin without opening the modal; r reloads from disk if you’ve edited .env or models.json outside the TUI; q quits. Both write paths are atomic (temp-file + rename for JSON, in-place rewrite for .env).

Applying a profile in one shot

Press p to open the presets screen — a profile picker with a live diff preview. Highlighting a profile shows you exactly which operations would change and which would already match, so you can preview the impact before committing. Each profile entry also displays a one-line local / remote / skip count so the differences between, say, local and fast-local are visible at a glance.

─ apply 'local' ─ 10/17 would change ─
  avatar         fal-ai/kling-video/ai-avatar/v2/standard → hallo2
  broll          fal-ai/veo3.1/fast                       → wan:1.3b
  image          gemini-3.1-flash-image-preview           → sdxl-turbo
  thumbnail      gemini-3.1-flash-image-preview           → sdxl-turbo
  …

The first entry on the list is Clear all FABRIC_*_MODEL pins — useful when you’ve experimented your way into a thicket of overrides and want to fall back to defaults + the local probe. The preview shows you what each currently-pinned op would revert to.

The same env-pin / models.json-default radio applies. Apply writes everything in one atomic batch (one append per pin for .env, one JSON rewrite for defaults) and the main table reloads to reflect the new state. Profiles are additive — they only touch the ops they explicitly define, so applying cheap doesn’t reset, e.g., your text pin if cheap doesn’t mention text.

When you only want this for one workflow run rather than server-wide, prefer the per-run override:

fabric run image/generate \
  --input topic="..." \
  --input thumbnail_model=gemini-3.1-flash-image-preview

Per-Run Overrides

Override any model for a single run:

fabric run video/ai-shorts \
  --input topic="AI productivity" \
  --input quality=local \
  --input tts_model=elevenlabs/eleven_turbo_v2_5 \
  --input broll_model="fal-ai/veo3.1/fast"

Per-run overrides take highest priority and override both quality profiles and config files.

Using in Custom Workflows

from fabric_workflow_sdk import get_model

# Resolves through the full priority chain
model = get_model(input, "text")           # "gemini-2.5-flash"
model = get_model(input, "broll")          # depends on quality profile
model = get_model(input, "keyframe_grid")  # "skip" or an image model

# With fallback for unknown operations
model = get_model(input, "my_custom_op", fallback="gemini-2.5-flash")