Skip to content

Model Configuration

Fabric uses a centralized model configuration system that determines which AI model handles each operation. This applies across all workflows, not just video generation.

Models are resolved with a 7-level priority chain. The first match wins:

  1. Per-run input override--input image_model="gemini-2.5-flash-image"
  2. Environment variableFABRIC_IMAGE_MODEL=gemini-2.5-flash-image
  3. Project config./models.yaml (walks up from cwd)
  4. Global config~/.fabric/models.yaml
  5. Explicit quality= profile--input quality=cheap (caller-set)
  6. Implicit prefer-local probe — when no other quality is set, Fabric probes for installed local backends and returns a local model if one exists for the operation. Disabled with FABRIC_PREFER_LOCAL=0. See Local-first by default.
  7. Built-in defaults

Fabric’s product policy is “always prefer local when possible” — the fewer remote AI calls per run, the lower the spend. Three layers cooperate to make this work without callers having to opt in:

LayerWhatWhere to override
SDK resolverWhen no quality is set, get_model consults _local_availability.probe_local_backends() and returns a local model when one exists for the operation.FABRIC_PREFER_LOCAL=0 to opt out (defaults to on)
Router strategyThe fabric serve provider router defaults its routing strategy to LocalFirst, which gives Tier::Basic (local) providers a flat scoring bonus over remote ones.FABRIC_ROUTING_STRATEGY=cheapest_qualified (or fastest, best_quality, balanced)
Diffusers providerA router-visible provider proxies image and video gen to the in-tree Python model server (fabric_workflow_sdk.model_server, default :8199) so LocalFirst actually has something to favour for image/video modalities. On by default.--diffusers-disabled or FABRIC_DIFFUSERS_DISABLED=1

Resolution-time fallback only handles “backend not installed before we tried.” Real production failures look fine at boot and only break under load — Diffusers OOM on a particular prompt, model server crashes mid-request, provider returns 5xx.

generate_image and generate_video are wrapped with a runtime walker that, on any exception from the primary attempt, walks _FALLBACK_CHAINS[operation] (local → remote ordering) and retries with each subsequent reachable model until one succeeds. Pinned-model callers (generate_image(input, prompt, model="X")) get a single attempt — production benchmarks that compare specific models depend on no silent retry.

When fallback engages you’ll see this in the logs:

fabric.runtime_fallback: image: model sdxl-turbo failed (model server returned 500), trying next in chain
fabric.runtime_fallback: image: fallback engaged sdxl-turbo → flux-schnell after error: model server returned 500

If every model in a chain fails, an AllAttemptsFailed error surfaces with the full attempt history.

Two embedded JSON files drive the local-first behaviour. Editing them is a JSON edit + rebuild — no code changes needed.

FileWhatLoaded by
sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.jsonfallback_chains sectionPer-operation chain (broll, video, avatar, tts, lipsync, image, image_fast). Each list is local-first → remote-fallback._models._FALLBACK_CHAINS (Python) — bundled with the SDK package
crates/fabric-providers/src/diffusers_models.jsonImage + video models the Diffusers router provider advertises. Mirrors the SDK’s _LOCAL_IMAGE_MAP / _LOCAL_VIDEO_MAP keys.default_image_models() / default_video_models() in fabric_providers::diffusers (Rust) — embedded via include_str!

When you add a new local model that needs both router-side capability advertising and SDK-side fallback, mirror the entry in both files.

Each pipeline stage maps to a named operation:

OperationDefault ModelDescription
textgemini-2.5-flashScript generation, LLM calls
imagegemini-3.1-flash-image-previewImage generation
image_fastimagen-4.0-fast-generate-001Fast image generation (thumbnails, stills)
videoveo-2.0-generate-001Video generation
brollfal-ai/veo3.1/fastB-roll video generation
broll_i2vfal-ai/kling-video/v3/pro/image-to-videoB-roll image-to-video
keyframe_gridimagen-4.0-fast-generate-001Keyframe grid generation
ttselevenlabs/eleven_multilingual_v2Text-to-speech
avatarfal-ai/kling-video/ai-avatar/v2/standardAI avatar / talking head
lipsyncveed/lipsyncLip synchronization
musicfal-ai/stable-audioBackground music
transcriptionfaster-whisper/large-v3Audio transcription
thumbnailgemini-3.1-flash-image-previewThumbnail generation

Create models.yaml in your project root or at ~/.fabric/models.yaml:

# Override individual operations
text: gemini-2.5-flash
image: gemini-3.1-flash-image-preview
tts: elevenlabs/eleven_multilingual_v2
broll: fal-ai/veo3.1/fast
keyframe_grid: imagen-4.0-fast-generate-001
# Define custom quality profiles
profiles:
my-custom:
tts: fal-ai/kokoro/american-english
broll: wan:1.3b
keyframe_grid: sdxl-turbo

Project-level config overrides global config. Top-level keys override profile keys.

Profiles bundle model selections for common use cases:

Budget-friendly with acceptable quality. Uses faster/cheaper model variants.

OperationModel
ttsfal-ai/kokoro/american-english
lipsyncfal-ai/lipsync
keyframe_gridskip

All other operations use defaults.

Full local pipeline with good quality. Requires 8+ GB VRAM.

OperationModel
textqwen3:8b (Ollama)
imagesdxl-turbo
videowan:1.3b
brollwan:1.3b
ttskokoro
avatarwav2lip
lipsyncwav2lip
musicmusicgen-small
keyframe_gridsdxl-turbo
broll_i2vskip

Setting any operation to "skip" disables it entirely. The pipeline gracefully handles skipped operations — for example, skipping broll means no b-roll videos are generated, and the final composition uses only talking-head segments.

Every operation maps to a FABRIC_<OPERATION>_MODEL environment variable:

Terminal window
export FABRIC_TEXT_MODEL=gemini-2.5-flash
export FABRIC_IMAGE_MODEL=sdxl-turbo
export FABRIC_TTS_MODEL=kokoro
export FABRIC_BROLL_MODEL=wan:1.3b
export FABRIC_KEYFRAME_GRID_MODEL=imagen-4.0-fast-generate-001

The implicit prefer-local probe is the right default for most operations — it keeps spend close to zero whenever a usable local backend is installed. But when one specific operation has a much faster remote alternative (image generation on MPS being the canonical example: ~7 s/thumbnail with local SDXL vs ~1–2 s with Gemini), pinning just that operation gets you remote speed without giving up local defaults for everything else.

Terminal window
# Pin image-heavy ops to Gemini; text / tts / vision still follow local-first.
export FABRIC_THUMBNAIL_MODEL=gemini-3.1-flash-image-preview
export FABRIC_IMAGE_MODEL=gemini-3.1-flash-image-preview

The FABRIC_<OP>_MODEL pin sits at priority 2 in the resolution chain, so it beats both the prefer-local probe and any quality profile. Per-run --input image_model=... still wins over the env pin, so callers can override on a single run if needed.

If you’re staring at a slow run and want to know which model would actually be picked for which workflow, run:

Terminal window
just model-status

That generates a self-contained HTML page (target/model-status.html) and opens it in your browser. The page statically scans every workflow + SDK stage for get_model(input, "X") calls and renders three tables:

  • Operation resolution — built-in default per op, the effective model after running the SDK’s full resolution chain (FABRIC_<OP>_MODEL pin → local-availability probe → built-in default), the local candidate the probe sees, and a local/remote/skip badge.
  • Workflows → operations used — which ops each workflow transitively reaches and the resolved model per op. Filterable by name or operation, so you can quickly answer “is image/generate going to hit Gemini or local SDXL right now?”
  • Active env pins — every FABRIC_*_MODEL and the resolved FABRIC_PREFER_LOCAL state, with a count of probe-reachable local backends.

The page reflects the environment of whoever ran just model-status, so source the same .env your server uses (the recipe does this for you when one exists at the repo root).

For change rather than just inspection:

Terminal window
just model-status-tui

A Textual app that shows the same operations table inline and lets you edit each row in place. Press e (or Enter) on the highlighted op to open an edit modal — a quick-pick UI with autocomplete:

  • Curated suggestions per op — populated from the op’s built-in default, the local-probe candidate, every named profile that defines this op (cheap, local, premium, …), the SDK’s fallback chain, and sibling-modality defaults (e.g. image_fast is suggested for thumbnail). Each row shows the model id, its local/remote/skip class, and the source label so you can see why a given model is being suggested.

  • Type-to-filter — the search box at the top of the modal does substring matching across the model id, source label, and class. Typing local shows only the local-class suggestions; typing gemini narrows to Gemini variants. Enter on the search field commits the top match; arrows move into the list for keyboard browsing.

  • Custom… fallback — when the right model isn’t in the list, pick “Custom…” to drop into a free-form input.

  • Two save buttons, no radio — the write target is an explicit choice between two clearly-labeled buttons:

    • Save → .env (pin) — writes FABRIC_<OP>_MODEL=<value> to .env. Server-local, reversible, and updated in place if the line already exists. Picked up by the next server restart.
    • Save → models.json (default) — mutates sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.json so the change ships to every SDK consumer after rebuild. Treat with more care.

    Pressing Enter on a suggestion (or in the search field) accepts the model choice and moves focus to the .env button — the safer / reversible target. Press Enter again to commit, or Shift+Tab once + Enter to commit to models.json instead. There’s no way to write to a target you didn’t explicitly pick.

c clears an active pin without opening the modal; r reloads from disk if you’ve edited .env or models.json outside the TUI; q quits. Both write paths are atomic (temp-file + rename for JSON, in-place rewrite for .env).

Press p to open the presets screen — a profile picker with a live diff preview. Highlighting a profile shows you exactly which operations would change and which would already match, so you can preview the impact before committing. Each profile entry also displays a one-line local / remote / skip count so the differences between, say, local and fast-local are visible at a glance.

─ apply 'local' ─ 10/17 would change ─
avatar fal-ai/kling-video/ai-avatar/v2/standard → hallo2
broll fal-ai/veo3.1/fast → wan:1.3b
image gemini-3.1-flash-image-preview → sdxl-turbo
thumbnail gemini-3.1-flash-image-preview → sdxl-turbo

The first entry on the list is Clear all FABRIC_*_MODEL pins — useful when you’ve experimented your way into a thicket of overrides and want to fall back to defaults + the local probe. The preview shows you what each currently-pinned op would revert to.

The same env-pin / models.json-default radio applies. Apply writes everything in one atomic batch (one append per pin for .env, one JSON rewrite for defaults) and the main table reloads to reflect the new state. Profiles are additive — they only touch the ops they explicitly define, so applying cheap doesn’t reset, e.g., your text pin if cheap doesn’t mention text.

When you only want this for one workflow run rather than server-wide, prefer the per-run override:

Terminal window
fabric run image/generate \
--input topic="..." \
--input thumbnail_model=gemini-3.1-flash-image-preview

Override any model for a single run:

Terminal window
fabric run video/ai-shorts \
--input topic="AI productivity" \
--input quality=local \
--input tts_model=elevenlabs/eleven_turbo_v2_5 \
--input broll_model="fal-ai/veo3.1/fast"

Per-run overrides take highest priority and override both quality profiles and config files.

from fabric_workflow_sdk import get_model
# Resolves through the full priority chain
model = get_model(input, "text") # "gemini-2.5-flash"
model = get_model(input, "broll") # depends on quality profile
model = get_model(input, "keyframe_grid") # "skip" or an image model
# With fallback for unknown operations
model = get_model(input, "my_custom_op", fallback="gemini-2.5-flash")