Model Configuration
Fabric uses a centralized model configuration system that determines which AI model handles each operation. This applies across all workflows, not just video generation.
Resolution Order
Section titled “Resolution Order”Models are resolved with a 7-level priority chain. The first match wins:
- Per-run input override —
--input image_model="gemini-2.5-flash-image" - Environment variable —
FABRIC_IMAGE_MODEL=gemini-2.5-flash-image - Project config —
./models.yaml(walks up from cwd) - Global config —
~/.fabric/models.yaml - Explicit
quality=profile —--input quality=cheap(caller-set) - Implicit
prefer-localprobe — when no other quality is set, Fabric probes for installed local backends and returns a local model if one exists for the operation. Disabled withFABRIC_PREFER_LOCAL=0. See Local-first by default. - Built-in defaults
Local-first by default
Section titled “Local-first by default”Fabric’s product policy is “always prefer local when possible” — the fewer remote AI calls per run, the lower the spend. Three layers cooperate to make this work without callers having to opt in:
| Layer | What | Where to override |
|---|---|---|
| SDK resolver | When no quality is set, get_model consults _local_availability.probe_local_backends() and returns a local model when one exists for the operation. | FABRIC_PREFER_LOCAL=0 to opt out (defaults to on) |
| Router strategy | The fabric serve provider router defaults its routing strategy to LocalFirst, which gives Tier::Basic (local) providers a flat scoring bonus over remote ones. | FABRIC_ROUTING_STRATEGY=cheapest_qualified (or fastest, best_quality, balanced) |
| Diffusers provider | A router-visible provider proxies image and video gen to the in-tree Python model server (fabric_workflow_sdk.model_server, default :8199) so LocalFirst actually has something to favour for image/video modalities. On by default. | --diffusers-disabled or FABRIC_DIFFUSERS_DISABLED=1 |
Runtime fallback
Section titled “Runtime fallback”Resolution-time fallback only handles “backend not installed before we tried.” Real production failures look fine at boot and only break under load — Diffusers OOM on a particular prompt, model server crashes mid-request, provider returns 5xx.
generate_image and generate_video are wrapped with a runtime walker that, on any exception from the primary attempt, walks _FALLBACK_CHAINS[operation] (local → remote ordering) and retries with each subsequent reachable model until one succeeds. Pinned-model callers (generate_image(input, prompt, model="X")) get a single attempt — production benchmarks that compare specific models depend on no silent retry.
When fallback engages you’ll see this in the logs:
fabric.runtime_fallback: image: model sdxl-turbo failed (model server returned 500), trying next in chainfabric.runtime_fallback: image: fallback engaged sdxl-turbo → flux-schnell after error: model server returned 500If every model in a chain fails, an AllAttemptsFailed error surfaces with the full attempt history.
Where the model lists live
Section titled “Where the model lists live”Two embedded JSON files drive the local-first behaviour. Editing them is a JSON edit + rebuild — no code changes needed.
| File | What | Loaded by |
|---|---|---|
sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.json → fallback_chains section | Per-operation chain (broll, video, avatar, tts, lipsync, image, image_fast). Each list is local-first → remote-fallback. | _models._FALLBACK_CHAINS (Python) — bundled with the SDK package |
crates/fabric-providers/src/diffusers_models.json | Image + video models the Diffusers router provider advertises. Mirrors the SDK’s _LOCAL_IMAGE_MAP / _LOCAL_VIDEO_MAP keys. | default_image_models() / default_video_models() in fabric_providers::diffusers (Rust) — embedded via include_str! |
When you add a new local model that needs both router-side capability advertising and SDK-side fallback, mirror the entry in both files.
Operations
Section titled “Operations”Each pipeline stage maps to a named operation:
| Operation | Default Model | Description |
|---|---|---|
text | gemini-2.5-flash | Script generation, LLM calls |
image | gemini-3.1-flash-image-preview | Image generation |
image_fast | imagen-4.0-fast-generate-001 | Fast image generation (thumbnails, stills) |
video | veo-2.0-generate-001 | Video generation |
broll | fal-ai/veo3.1/fast | B-roll video generation |
broll_i2v | fal-ai/kling-video/v3/pro/image-to-video | B-roll image-to-video |
keyframe_grid | imagen-4.0-fast-generate-001 | Keyframe grid generation |
tts | elevenlabs/eleven_multilingual_v2 | Text-to-speech |
avatar | fal-ai/kling-video/ai-avatar/v2/standard | AI avatar / talking head |
lipsync | veed/lipsync | Lip synchronization |
music | fal-ai/stable-audio | Background music |
transcription | faster-whisper/large-v3 | Audio transcription |
thumbnail | gemini-3.1-flash-image-preview | Thumbnail generation |
Config File
Section titled “Config File”Create models.yaml in your project root or at ~/.fabric/models.yaml:
# Override individual operationstext: gemini-2.5-flashimage: gemini-3.1-flash-image-previewtts: elevenlabs/eleven_multilingual_v2broll: fal-ai/veo3.1/fastkeyframe_grid: imagen-4.0-fast-generate-001
# Define custom quality profilesprofiles: my-custom: tts: fal-ai/kokoro/american-english broll: wan:1.3b keyframe_grid: sdxl-turboProject-level config overrides global config. Top-level keys override profile keys.
Quality Profiles
Section titled “Quality Profiles”Profiles bundle model selections for common use cases:
Remote Profiles
Section titled “Remote Profiles”Budget-friendly with acceptable quality. Uses faster/cheaper model variants.
| Operation | Model |
|---|---|
tts | fal-ai/kokoro/american-english |
lipsync | fal-ai/lipsync |
keyframe_grid | skip |
All other operations use defaults.
Default models across the board. No overrides — uses the built-in defaults.
Local Profiles
Section titled “Local Profiles”Full local pipeline with good quality. Requires 8+ GB VRAM.
| Operation | Model |
|---|---|
text | qwen3:8b (Ollama) |
image | sdxl-turbo |
video | wan:1.3b |
broll | wan:1.3b |
tts | kokoro |
avatar | wav2lip |
lipsync | wav2lip |
music | musicgen-small |
keyframe_grid | sdxl-turbo |
broll_i2v | skip |
Best local quality. Requires more compute.
| Operation | Model |
|---|---|
text | qwen3:latest (Ollama) |
image | flux-schnell |
video | wan:1.3b |
broll | wan:1.3b |
tts | voxtral |
avatar | wav2lip |
lipsync | wav2lip |
music | musicgen-small |
keyframe_grid | flux-schnell |
Minimal resource usage. Skips heavy operations.
| Operation | Model |
|---|---|
text | gemma3:4b (Ollama) |
image | sdxl-turbo |
video | skip |
broll | skip |
tts | piper |
avatar | skip |
lipsync | skip |
music | skip |
keyframe_grid | skip |
Skip Behavior
Section titled “Skip Behavior”Setting any operation to "skip" disables it entirely. The pipeline gracefully handles skipped operations — for example, skipping broll means no b-roll videos are generated, and the final composition uses only talking-head segments.
Environment Variables
Section titled “Environment Variables”Every operation maps to a FABRIC_<OPERATION>_MODEL environment variable:
export FABRIC_TEXT_MODEL=gemini-2.5-flashexport FABRIC_IMAGE_MODEL=sdxl-turboexport FABRIC_TTS_MODEL=kokoroexport FABRIC_BROLL_MODEL=wan:1.3bexport FABRIC_KEYFRAME_GRID_MODEL=imagen-4.0-fast-generate-001Pinning hot paths to remote
Section titled “Pinning hot paths to remote”The implicit prefer-local probe is the right default for most operations — it keeps spend close to zero whenever a usable local backend is installed. But when one specific operation has a much faster remote alternative (image generation on MPS being the canonical example: ~7 s/thumbnail with local SDXL vs ~1–2 s with Gemini), pinning just that operation gets you remote speed without giving up local defaults for everything else.
# Pin image-heavy ops to Gemini; text / tts / vision still follow local-first.export FABRIC_THUMBNAIL_MODEL=gemini-3.1-flash-image-previewexport FABRIC_IMAGE_MODEL=gemini-3.1-flash-image-previewThe FABRIC_<OP>_MODEL pin sits at priority 2 in the resolution chain, so it beats both the prefer-local probe and any quality profile. Per-run --input image_model=... still wins over the env pin, so callers can override on a single run if needed.
Inspecting what’s active
Section titled “Inspecting what’s active”If you’re staring at a slow run and want to know which model would actually be picked for which workflow, run:
just model-statusThat generates a self-contained HTML page (target/model-status.html) and opens it in your browser. The page statically scans every workflow + SDK stage for get_model(input, "X") calls and renders three tables:
- Operation resolution — built-in default per op, the effective model after running the SDK’s full resolution chain (
FABRIC_<OP>_MODELpin → local-availability probe → built-in default), the local candidate the probe sees, and alocal/remote/skipbadge. - Workflows → operations used — which ops each workflow transitively reaches and the resolved model per op. Filterable by name or operation, so you can quickly answer “is
image/generategoing to hit Gemini or local SDXL right now?” - Active env pins — every
FABRIC_*_MODELand the resolvedFABRIC_PREFER_LOCALstate, with a count of probe-reachable local backends.
The page reflects the environment of whoever ran just model-status, so source the same .env your server uses (the recipe does this for you when one exists at the repo root).
Editing assignments interactively
Section titled “Editing assignments interactively”For change rather than just inspection:
just model-status-tuiA Textual app that shows the same operations table inline and lets you edit each row in place. Press e (or Enter) on the highlighted op to open an edit modal — a quick-pick UI with autocomplete:
-
Curated suggestions per op — populated from the op’s built-in default, the local-probe candidate, every named profile that defines this op (
cheap,local,premium, …), the SDK’s fallback chain, and sibling-modality defaults (e.g.image_fastis suggested forthumbnail). Each row shows the model id, itslocal/remote/skipclass, and the source label so you can see why a given model is being suggested. -
Type-to-filter — the search box at the top of the modal does substring matching across the model id, source label, and class. Typing
localshows only the local-class suggestions; typinggemininarrows to Gemini variants. Enter on the search field commits the top match; arrows move into the list for keyboard browsing. -
Custom… fallback — when the right model isn’t in the list, pick “Custom…” to drop into a free-form input.
-
Two save buttons, no radio — the write target is an explicit choice between two clearly-labeled buttons:
- Save → .env (pin) — writes
FABRIC_<OP>_MODEL=<value>to.env. Server-local, reversible, and updated in place if the line already exists. Picked up by the next server restart. - Save → models.json (default) — mutates
sdks/fabric-workflow-sdk/fabric_workflow_sdk/models.jsonso the change ships to every SDK consumer after rebuild. Treat with more care.
Pressing Enter on a suggestion (or in the search field) accepts the model choice and moves focus to the
.envbutton — the safer / reversible target. Press Enter again to commit, or Shift+Tab once + Enter to commit tomodels.jsoninstead. There’s no way to write to a target you didn’t explicitly pick. - Save → .env (pin) — writes
c clears an active pin without opening the modal; r reloads from disk if you’ve edited .env or models.json outside the TUI; q quits. Both write paths are atomic (temp-file + rename for JSON, in-place rewrite for .env).
Applying a profile in one shot
Section titled “Applying a profile in one shot”Press p to open the presets screen — a profile picker with a live diff preview. Highlighting a profile shows you exactly which operations would change and which would already match, so you can preview the impact before committing. Each profile entry also displays a one-line local / remote / skip count so the differences between, say, local and fast-local are visible at a glance.
─ apply 'local' ─ 10/17 would change ─ avatar fal-ai/kling-video/ai-avatar/v2/standard → hallo2 broll fal-ai/veo3.1/fast → wan:1.3b image gemini-3.1-flash-image-preview → sdxl-turbo thumbnail gemini-3.1-flash-image-preview → sdxl-turbo …The first entry on the list is Clear all FABRIC_*_MODEL pins — useful when you’ve experimented your way into a thicket of overrides and want to fall back to defaults + the local probe. The preview shows you what each currently-pinned op would revert to.
The same env-pin / models.json-default radio applies. Apply writes everything in one atomic batch (one append per pin for .env, one JSON rewrite for defaults) and the main table reloads to reflect the new state. Profiles are additive — they only touch the ops they explicitly define, so applying cheap doesn’t reset, e.g., your text pin if cheap doesn’t mention text.
When you only want this for one workflow run rather than server-wide, prefer the per-run override:
fabric run image/generate \ --input topic="..." \ --input thumbnail_model=gemini-3.1-flash-image-previewPer-Run Overrides
Section titled “Per-Run Overrides”Override any model for a single run:
fabric run video/ai-shorts \ --input topic="AI productivity" \ --input quality=local \ --input tts_model=elevenlabs/eleven_turbo_v2_5 \ --input broll_model="fal-ai/veo3.1/fast"Per-run overrides take highest priority and override both quality profiles and config files.
Using in Custom Workflows
Section titled “Using in Custom Workflows”from fabric_workflow_sdk import get_model
# Resolves through the full priority chainmodel = get_model(input, "text") # "gemini-2.5-flash"model = get_model(input, "broll") # depends on quality profilemodel = get_model(input, "keyframe_grid") # "skip" or an image model
# With fallback for unknown operationsmodel = get_model(input, "my_custom_op", fallback="gemini-2.5-flash")