Local Models

Fabric supports local AI model inference alongside remote providers (OpenAI, Anthropic). No API keys required for local models.

Ollama (Recommended for LLMs)

Ollama runs models locally. Supports Qwen3, Llama, Mistral, DeepSeek, Gemma, Phi, and many more.

Install

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Pull Models

ollama pull qwen3:latest        # Qwen3 (default)
ollama pull llama3.2:latest     # Llama 3.2
ollama pull deepseek-r1:latest  # DeepSeek R1
ollama pull mistral:latest      # Mistral
ollama pull nomic-embed-text    # Embeddings

Run

ollama serve  # Starts on http://localhost:11434

Configure Fabric

# In .env
OLLAMA_ENABLED=true
# Or point to a remote Ollama instance:
OLLAMA_URL=http://gpu-server:11434

Usage via API

# List available providers (should show "ollama")
curl http://localhost:3001/v1/providers

# Execute with Ollama
curl -X POST http://localhost:3001/v1/providers/execute \
  -H 'content-type: application/json' \
  -d '{
    "modality": "text",
    "model": "qwen3:latest",
    "input": {"prompt": "Explain quantum computing in one sentence"},
    "params": {"temperature": 0.7}
  }'

# Embeddings
curl -X POST http://localhost:3001/v1/providers/execute \
  -H 'content-type: application/json' \
  -d '{
    "modality": "embedding",
    "model": "nomic-embed-text",
    "input": {"text": "Hello world"}
  }'

Whisper (Audio Transcription)

For local audio transcription using whisper.cpp or faster-whisper.

whisper.cpp Server

# Build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make
# Download model
bash ./models/download-ggml-model.sh large-v3
# Start server
./server -m models/ggml-large-v3.bin --port 8080

Configure Fabric

# In .env
WHISPER_URL=http://localhost:8080

Usage

curl -X POST http://localhost:3001/v1/providers/execute \
  -H 'content-type: application/json' \
  -d '{
    "modality": "audio",
    "input": {"audio_url": "https://example.com/speech.wav"},
    "params": {"language": "en"}
  }'

OpenAI-Compatible Servers

Any server that implements the OpenAI chat completions API works with the OpenAI provider:

vLLM: OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8000/v1
llama.cpp server: OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8080/v1
LocalAI: OPENAI_API_KEY=dummy OPENAI_BASE_URL=http://localhost:8080/v1

Set OPENAI_API_KEY to any non-empty string (the local server ignores it).

Provider Priority

When multiple providers support the same modality, Fabric routes to the first match. Registration order:

OpenAI (if OPENAI_API_KEY set)
Anthropic (if ANTHROPIC_API_KEY set)
Ollama (if OLLAMA_ENABLED or OLLAMA_URL set)
Whisper (if WHISPER_URL set)
Echo providers (always, fallback for testing)

To force a specific provider, include "model": "qwen3:latest" in the request — the router will match the provider that advertises that model.

Cost

Local models have zero API cost. The /v1/providers/estimate endpoint returns $0.00 for Ollama and Whisper.