Local Video & Image Models

Fabric supports fully local video and image generation using open-source models. No API keys, no cloud costs, no data leaving your machine.

Resolution Chain

When generating video or images locally, Fabric tries backends in order until one succeeds:

mlx-video — Apple Silicon native via MLX framework (Mac only). Fastest on M-series chips.
diffusers — HuggingFace pipelines with CUDA or MPS acceleration (cross-platform).
ComfyUI — If FABRIC_COMFYUI_URL is set, delegates to a ComfyUI server.
Ken Burns fallback — Generates a still image and applies a zoom animation via FFmpeg.

Supported Models

Video Models

Model ID	VRAM	FPS	Default Frames	Resolution	Backend
`wan:1.3b`	8 GB	16	33 (~2s)	480x832	mlx-video, diffusers
`wan:14b`	24 GB	16	81 (~5s)	480x832	mlx-video, diffusers
`ltx-video`	8 GB	24	97 (~4s)	768x512	mlx-video, diffusers
`cogvideox:2b`	6 GB	8	49 (~6s)	480x720	diffusers only
`cogvideox:5b`	12 GB	8	49 (~6s)	480x720	diffusers only

Image Models

Model ID	VRAM	Steps	Backend
`sdxl-turbo`	6 GB	4	diffusers
`flux-schnell`	8 GB	4	diffusers
`sd3.5-medium`	8 GB	28	diffusers

Avatar Models (Talking Head)

Model ID	VRAM	Type	Built-in Lip-sync
`sadtalker`	8 GB	Avatar	Yes
`echomimic`	16 GB	Avatar	Yes
`hallo`	24 GB	Avatar	Yes

Lip-sync Models

Model ID	VRAM	Type
`wav2lip`	4 GB	Lip-sync
`latentsync`	8 GB	Lip-sync
`musetalk`	16 GB	Lip-sync

Installation

Mac (Apple Silicon)

# MLX-video — native Apple Silicon, recommended
pip install "mlx-video @ git+https://github.com/Blaizzy/mlx-video.git"

# Models are downloaded automatically on first use
# Cached at: ~/.cache/mlx-models/

Any Platform (CUDA or MPS)

# Core dependencies
pip install diffusers torch transformers accelerate sentencepiece

# Models are downloaded from HuggingFace on first use

ComfyUI (Alternative)

# Point to a running ComfyUI server
export FABRIC_COMFYUI_URL=http://localhost:8188

Usage

In Workflows

from fabric_workflow_sdk._local_video import (
    generate_video,
    generate_image,
    generate_talking_head,
    lipsync_video,
    is_available,
)

# Check if any local backend is available
if is_available():
    # Generate video
    video_path = await generate_video(
        input_dict,
        "A cinematic ocean wave crashing on rocks",
        model="wan:1.3b",
        duration=5,
    )

    # Generate image
    image_path = await generate_image(
        input_dict,
        "A sunset over mountains",
        model="sdxl-turbo",
        aspect_ratio="9:16",
    )

    # Generate talking head from portrait + audio
    video_path = await generate_talking_head(
        image_path="portrait.png",
        audio_path="voiceover.mp3",
        model="sadtalker",
    )

    # Lip-sync existing video to new audio
    synced_path = await lipsync_video(
        video_path="talking.mp4",
        audio_path="new_audio.mp3",
        model="wav2lip",
    )

With Quality Profiles

Set quality=local to use local models for the entire AI Shorts pipeline:

fabric run global/ai-shorts \
  --input topic="The future of AI" \
  --input quality=local

Profile	Video	Image	TTS	Avatar
`local`	wan:1.3b	sdxl-turbo	Kokoro	Wav2Lip
`local-power`	wan:1.3b	flux-schnell	Kokoro	Wav2Lip
`local-light`	skip	sdxl-turbo	Piper	skip

MLX Model Conversion

On first use, MLX models are downloaded from HuggingFace and converted to MLX format. This is a one-time operation:

Downloading and converting Wan-AI/Wan2.1-T2V-1.3B to MLX format (first time only)...
  Converted T5 encoder: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3B/t5_encoder.safetensors
  Converted VAE: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3B/vae.safetensors
  Model ready at: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3B

Converted weights are cached at ~/.cache/mlx-models/ and reused across sessions.

Pipeline Caching

Loaded diffusers pipelines are cached in memory to avoid reloading weights between generations. The cache is automatically cleaned on process exit.

Aspect Ratio Support

Image generation supports these aspect ratios:

Aspect Ratio	Resolution	Use Case
`9:16`	1080x1920	Vertical social (TikTok, Reels)
`16:9`	1920x1080	Horizontal (YouTube)
`1:1` / `square`	1024x1024	Square (Instagram)
`3:4`	768x1024	Portrait