Local Video & Image Models
Fabric supports fully local video and image generation using open-source models. No API keys, no cloud costs, no data leaving your machine.
Resolution Chain
Section titled “Resolution Chain”When generating video or images locally, Fabric tries backends in order until one succeeds:
- mlx-video — Apple Silicon native via MLX framework (Mac only). Fastest on M-series chips.
- diffusers — HuggingFace pipelines with CUDA or MPS acceleration (cross-platform).
- ComfyUI — If
FABRIC_COMFYUI_URLis set, delegates to a ComfyUI server. - Ken Burns fallback — Generates a still image and applies a zoom animation via FFmpeg.
Supported Models
Section titled “Supported Models”Video Models
Section titled “Video Models”| Model ID | VRAM | FPS | Default Frames | Resolution | Backend |
|---|---|---|---|---|---|
wan:1.3b | 8 GB | 16 | 33 (~2s) | 480x832 | mlx-video, diffusers |
wan:14b | 24 GB | 16 | 81 (~5s) | 480x832 | mlx-video, diffusers |
ltx-video | 8 GB | 24 | 97 (~4s) | 768x512 | mlx-video, diffusers |
cogvideox:2b | 6 GB | 8 | 49 (~6s) | 480x720 | diffusers only |
cogvideox:5b | 12 GB | 8 | 49 (~6s) | 480x720 | diffusers only |
Image Models
Section titled “Image Models”| Model ID | VRAM | Steps | Backend |
|---|---|---|---|
sdxl-turbo | 6 GB | 4 | diffusers |
flux-schnell | 8 GB | 4 | diffusers |
sd3.5-medium | 8 GB | 28 | diffusers |
Avatar Models (Talking Head)
Section titled “Avatar Models (Talking Head)”| Model ID | VRAM | Type | Built-in Lip-sync |
|---|---|---|---|
sadtalker | 8 GB | Avatar | Yes |
echomimic | 16 GB | Avatar | Yes |
hallo | 24 GB | Avatar | Yes |
Lip-sync Models
Section titled “Lip-sync Models”| Model ID | VRAM | Type |
|---|---|---|
wav2lip | 4 GB | Lip-sync |
latentsync | 8 GB | Lip-sync |
musetalk | 16 GB | Lip-sync |
Installation
Section titled “Installation”Mac (Apple Silicon)
Section titled “Mac (Apple Silicon)”# MLX-video — native Apple Silicon, recommendedpip install "mlx-video @ git+https://github.com/Blaizzy/mlx-video.git"
# Models are downloaded automatically on first use# Cached at: ~/.cache/mlx-models/Any Platform (CUDA or MPS)
Section titled “Any Platform (CUDA or MPS)”# Core dependenciespip install diffusers torch transformers accelerate sentencepiece
# Models are downloaded from HuggingFace on first useComfyUI (Alternative)
Section titled “ComfyUI (Alternative)”# Point to a running ComfyUI serverexport FABRIC_COMFYUI_URL=http://localhost:8188In Workflows
Section titled “In Workflows”from fabric_workflow_sdk._local_video import ( generate_video, generate_image, generate_talking_head, lipsync_video, is_available,)
# Check if any local backend is availableif is_available(): # Generate video video_path = await generate_video( input_dict, "A cinematic ocean wave crashing on rocks", model="wan:1.3b", duration=5, )
# Generate image image_path = await generate_image( input_dict, "A sunset over mountains", model="sdxl-turbo", aspect_ratio="9:16", )
# Generate talking head from portrait + audio video_path = await generate_talking_head( image_path="portrait.png", audio_path="voiceover.mp3", model="sadtalker", )
# Lip-sync existing video to new audio synced_path = await lipsync_video( video_path="talking.mp4", audio_path="new_audio.mp3", model="wav2lip", )With Quality Profiles
Section titled “With Quality Profiles”Set quality=local to use local models for the entire AI Shorts pipeline:
fabric run global/ai-shorts \ --input topic="The future of AI" \ --input quality=local| Profile | Video | Image | TTS | Avatar |
|---|---|---|---|---|
local | wan:1.3b | sdxl-turbo | Kokoro | Wav2Lip |
local-power | wan:1.3b | flux-schnell | Kokoro | Wav2Lip |
local-light | skip | sdxl-turbo | Piper | skip |
MLX Model Conversion
Section titled “MLX Model Conversion”On first use, MLX models are downloaded from HuggingFace and converted to MLX format. This is a one-time operation:
Downloading and converting Wan-AI/Wan2.1-T2V-1.3B to MLX format (first time only)... Converted T5 encoder: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3B/t5_encoder.safetensors Converted VAE: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3B/vae.safetensors Model ready at: ~/.cache/mlx-models/Wan-AI--Wan2.1-T2V-1.3BConverted weights are cached at ~/.cache/mlx-models/ and reused across sessions.
Pipeline Caching
Section titled “Pipeline Caching”Loaded diffusers pipelines are cached in memory to avoid reloading weights between generations. The cache is automatically cleaned on process exit.
Aspect Ratio Support
Section titled “Aspect Ratio Support”Image generation supports these aspect ratios:
| Aspect Ratio | Resolution | Use Case |
|---|---|---|
9:16 | 1080x1920 | Vertical social (TikTok, Reels) |
16:9 | 1920x1080 | Horizontal (YouTube) |
1:1 / square | 1024x1024 | Square (Instagram) |
3:4 | 768x1024 | Portrait |