Local Models
Fabric supports local AI model inference alongside remote providers (OpenAI, Anthropic). No API keys required for local models.
Ollama (Recommended for LLMs)
Section titled “Ollama (Recommended for LLMs)”Ollama runs models locally. Supports Qwen3, Llama, Mistral, DeepSeek, Gemma, Phi, and many more.
Install
Section titled “Install”# macOSbrew install ollama
# Linuxcurl -fsSL https://ollama.com/install.sh | shPull Models
Section titled “Pull Models”ollama pull qwen3:latest # Qwen3 (default)ollama pull llama3.2:latest # Llama 3.2ollama pull deepseek-r1:latest # DeepSeek R1ollama pull mistral:latest # Mistralollama pull nomic-embed-text # Embeddingsollama serve # Starts on http://localhost:11434Configure Fabric
Section titled “Configure Fabric”# In .envOLLAMA_ENABLED=true# Or point to a remote Ollama instance:OLLAMA_URL=http://gpu-server:11434Usage via API
Section titled “Usage via API”# List available providers (should show "ollama")curl http://localhost:3001/v1/providers
# Execute with Ollamacurl -X POST http://localhost:3001/v1/providers/execute \ -H 'content-type: application/json' \ -d '{ "modality": "text", "model": "qwen3:latest", "input": {"prompt": "Explain quantum computing in one sentence"}, "params": {"temperature": 0.7} }'
# Embeddingscurl -X POST http://localhost:3001/v1/providers/execute \ -H 'content-type: application/json' \ -d '{ "modality": "embedding", "model": "nomic-embed-text", "input": {"text": "Hello world"} }'Whisper (Audio Transcription)
Section titled “Whisper (Audio Transcription)”For local audio transcription using whisper.cpp or faster-whisper.
whisper.cpp Server
Section titled “whisper.cpp Server”# Build whisper.cppgit clone https://github.com/ggerganov/whisper.cppcd whisper.cpp && make# Download modelbash ./models/download-ggml-model.sh large-v3# Start server./server -m models/ggml-large-v3.bin --port 8080Configure Fabric
Section titled “Configure Fabric”# In .envWHISPER_URL=http://localhost:8080curl -X POST http://localhost:3001/v1/providers/execute \ -H 'content-type: application/json' \ -d '{ "modality": "audio", "input": {"audio_url": "https://example.com/speech.wav"}, "params": {"language": "en"} }'OpenAI-Compatible Servers
Section titled “OpenAI-Compatible Servers”Any server that implements the OpenAI chat completions API works with the OpenAI provider:
- vLLM:
OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8000/v1 - llama.cpp server:
OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8080/v1 - LocalAI:
OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8080/v1
Set OPENAI_API_KEY to any non-empty string (the local server ignores it).
Provider Priority
Section titled “Provider Priority”When multiple providers support the same modality, Fabric routes to the first match. Registration order:
- OpenAI (if
OPENAI_API_KEYset) - Anthropic (if
ANTHROPIC_API_KEYset) - Ollama (if
OLLAMA_ENABLEDorOLLAMA_URLset) - Whisper (if
WHISPER_URLset) - Echo providers (always, fallback for testing)
To force a specific provider, include "model": "qwen3:latest" in the request — the router will match the provider that advertises that model.
Local models have zero API cost. The /v1/providers/estimate endpoint returns $0.00 for Ollama and Whisper.