Skip to content

Music Workflows

Music workflows generate AI instrumentals and package them into release bundles ready for manual upload to DistroKid (and onward to Spotify, Apple Music, etc.). Phase 1 covers passive instrumental niches — sleep, study, lo-fi, ambient, white noise, focus, meditation — using fabric’s existing stages/music.py (Stable Audio Open via FAL, or local fallback). Vocal song generation (ACE-Step 1.5) is planned for Phase 3 and not yet wired.

The release pipeline only accepts music models whose licenses permit monetized commercial use:

ModelLicenseStatus
fal-ai/stable-audioStable Audio Open via FAL✅ Default
stable-audio-openStability Community License (free for entities under $1M ARR)✅ Local fallback
musicgen-small/medium/largeCC-BY-NC 4.0❌ Blocked when commercial_only=True
audioldm2, mustango, jascoCC-BY-NC / research-only❌ Blocked

The music/release workflow forces commercial_only=True on the underlying music stage, so non-commercial models cannot be selected even if requested. If you need a non-commercial model for non-monetized work, use stages/music.py directly.

PathHardwareReliabilityCostRecommended for
fal-ai/stable-audio (FAL hosted)Any✅ Production~$0.04/30s clipAll Mac users; production runs on any host
stable-audio-open (local)CUDA GPU✅ ReliableGPU electricityLinux/Windows with NVIDIA GPU
stable-audio-open (local)Mac Apple-Silicon⚠️ UnstableFreeNot recommended — see below
ACE-Step 1.5 (Phase 3, deferred)6–24GB VRAM tieredTBDGPU electricityWill replace stable-audio-open as the local path once Phase 3 ships

On Mac Apple-Silicon specifically, stable-audio-open’s CosineDPMSolverMultistepScheduler hits a numerical RecursionError inside torchsde’s BrownianInterval near the end of the diffusion loop. The 100-step pipeline runs through ~99 steps and then crashes — verified locally on M-series hardware. Upstream torchsde + diffusers compatibility issue in fp32/CPU mode; CUDA fp16 works fine.

The realistic path on a Mac dev machine in 2026:

  • Music gen → FAL (production-quality, ~$0.04/30s, no setup hurdles)
  • Cover art → local (SDXL/Flux via fabric’s prefer-local default — works fine on Apple-Silicon)
  • Wait for Phase 3 (ACE-Step 1.5) for a fully-local Mac path. ACE-Step uses a different scheduler that’s verified on Apple-Silicon CPU and runs the full 4-min track in one call (no need for the multi-segment crossfade-concat scaffold).

Running fully local (CUDA only — Mac users see table above)

Section titled “Running fully local (CUDA only — Mac users see table above)”

Override audio_model to stable-audio-open — the only local model on the commercial-clearance allowlist. Cover art is already local under fabric’s prefer-local default.

Terminal window
# One-off
fab-workflow --from-file examples/run-inputs/music-release-ambient.yaml \
--input audio_model=stable-audio-open
# Or use the committed local example
fab-workflow --from-file examples/run-inputs/music-release-ambient-local.yaml

One-time prerequisites for the local music backend

Section titled “One-time prerequisites for the local music backend”

Two upfront steps before Stable Audio Open will run locally — without both, the music stage silently falls back to fal-ai/stable-audio, burning FAL credits and defeating the point of running local:

  1. Install torchsde. Stable Audio Open’s CosineDPMSolverMultistepScheduler depends on it. Without it, you’ll see a misleading cannot import name 'StableAudioPipeline' error — that’s the upstream import cascade hiding the real missing dep. Install with pip install torchsde, or preferably reinstall the SDK with the local-music extra: pip install -e 'sdks/fabric-workflow-sdk[local-music]'.

  2. Accept the HF gate. Stable Audio Open is a gated model — Stability requires you to accept their Community License before download:

    • Visit stabilityai/stable-audio-open-1.0 and click “Agree and access repository”. Free; usually instant for individuals.
    • Authenticate your local HF cache: huggingface-cli login and paste a read-token from hf.co/settings/tokens. Or set HF_TOKEN=hf_xxx in your shell.

First run downloads ~5GB of Stable Audio Open weights into ~/.cache/huggingface/. Subsequent runs are fast. Cover-art weights (~7GB SDXL or similar) are downloaded on first cover-gen run; SDXL is not gated.

Known limitation on Mac CPU: stable-audio-open’s torchsde-based scheduler hits a numerical RecursionError in fp32/CPU mode at the end of the diffusion loop. CUDA hardware works reliably; on Mac Apple-Silicon, FAL is the more reliable path until ACE-Step (Phase 3) replaces this backend.

Why no truly-ungated commercial-cleared local model exists

Section titled “Why no truly-ungated commercial-cleared local model exists”

The commercial-clearance gate blocks every popular non-Stable-Audio open music model (MusicGen, AudioLDM2, Mustango, JASCO — all CC-BY-NC or research-only). Stable Audio Open is the only OSS music model in 2026 with both a commercial-friendly license and working production weights. The HF gate is the cost of admission. If a fully license-clean and ungated alternative emerges, fabric will add it to the allowlist.

Workflow: music/release

Generates one instrumental track, masters it to -14 LUFS (Spotify normalization target), generates square cover art, and writes a DistroKid-ready bundle to output_dir/{artist}_{title}/.

Terminal window
fab-workflow music/release \
--input prompt="deep sleep ambient pad, slow rain underneath, 432Hz, no drums, no vocals" \
--input title="Drift Past Midnight" \
--input artist_name="Hollow Field" \
--input genre="Ambient" \
--input release_date="2026-05-21" \
--input 'songwriter_splits=[{"name":"Aneyzberg, Ari","share":100,"role":"composer","pro":"ASCAP"}]'
prepare_release → generate_track → generate_cover → assemble_bundle → collect_release_output

prepare_release enforces Phase-1 invariants (no vocals, AI disclosure, songwriter splits sum to 100, auto_submit=false). generate_track calls the existing music stage with commercial_only=True. generate_cover produces square album art. assemble_bundle masters to -14 LUFS, re-encodes the cover to a 3000×3000 sRGB JPEG ≤10MB, and writes the metadata CSV + JSON sidecar.

  • vocals must be false in Phase 1 — vocal song generation via ACE-Step 1.5 is Phase 3 (deferred).
  • ai_disclosure must be true — workflow-level invariant, set as a per-run default.
  • distribution.auto_submit must be false — Phase 1 produces bundles for manual upload only.
  • songwriter_splits must include at least one entry, and shares must sum to 100 — required for publishing royalties via your PRO (ASCAP/BMI/SESAC) and a publishing admin (Songtrust, Sentric).
  • Mastering target defaults to -14 LUFS integrated; runs that exceed it fail validation rather than silently get attenuated by Spotify.
{output_dir}/{artist-slug}_{title-slug}/
├── audio.wav # 16-bit / 44.1kHz stereo, mastered to -14 LUFS
├── cover.jpg # 3000×3000 sRGB JPEG, ≤10MB
├── metadata.csv # DistroKid bulk-upload row
└── release.json # Machine-readable sidecar (full metadata + songwriter splits)

The human uploader logs into DistroKid, creates a new release, fills the form by reading the CSV, and uploads audio.wav + cover.jpg. Once Spotify ingests (24–48h), claim the artist via Spotify for Artists and pitch the track to editorial playlists at least 7 days before the release date.

  • Spotify counts a play after 30s — tracks shorter than 30s earn $0 regardless of plays.
  • Sub-1,000-plays-per-year tracks earn $0 (2024 monetization threshold). Long-tail catalog of duds is worthless; quality over quantity.
  • Artificial-streaming penalties — distributors fine $10/track and Spotify can claw back payouts. Bot-padded plays trigger this even when the artist is innocent.
  • Metadata is locked at release — changing title/artist/genre after publication breaks Spotify’s algorithmic placement. Validate before submitting.
  • Without playlist placement, even a perfect track earns near-zero. Pitch via Spotify-for-Artists ≥7 days pre-release.

See plan 082 for the full set of legal, anti-fraud, operational, and discoverability considerations.