Media Workflows

Media workflows handle the core video and audio processing primitives — transcription, reframing, subtitles, and format conversion. They are used as building blocks by higher-level pipelines (AI Shorts, Clip Generator, YouTube Studio).

Transcription

Workflow: media/transcription

Downloads video/audio from a URL or local path, then produces word-level transcripts via Faster Whisper.

# From YouTube
fabric run media/transcription --input url="https://youtube.com/watch?v=..."

# From local file
fabric run media/transcription --input media_path="/path/to/video.mp4"

Pipeline

download_media → transcribe_audio

Input

Parameter	Type	Default	Description
`url` / `video_url` / `media_url`	`string`	—	URL to download (HTTP or YouTube)
`media_path` / `video_path`	`string`	—	Local file path
`whisper_model`	`string`	`"large-v3"`	Faster Whisper model size
`language`	`string`	auto-detect	Language code (e.g., `"en"`)

Output

{
  "media_path": "/tmp/downloaded.mp4",
  "full_text": "The complete transcribed text...",
  "transcript": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello and welcome",
      "words": [
        {"word": "Hello", "start": 0.0, "end": 0.4, "probability": 0.98},
        {"word": "and", "start": 0.5, "end": 0.6, "probability": 0.95},
        {"word": "welcome", "start": 0.7, "end": 1.2, "probability": 0.97}
      ]
    }
  ]
}

Vertical Reframe

Workflow: media/reframe

Converts landscape (16:9) video to vertical (9:16) with intelligent subject tracking using MediaPipe face detection and YOLOv8 person detection.

fabric run media/reframe --input media_path="/path/to/landscape.mp4"

Pipeline

detect_and_track_subjects → crop_to_vertical

Input

Parameter	Type	Default	Description
`media_path`	`string`	required	Path to landscape video

Output

{
  "vertical_path": "/tmp/vertical_9x16.mp4",
  "video_width": 1080,
  "video_height": 1920,
  "fps": 30,
  "total_frames": 900,
  "tracking_data": { "faces": [...], "persons": [...] }
}

Subtitles

Workflow: media/subtitles

Aligns word-level timestamps into subtitle chunks and burns them into the video using FFmpeg ASS rendering.

fabric run media/subtitles \
  --input media_path="/path/to/video.mp4" \
  --input subtitle_max_chars=40

Pipeline

align_words → render_subtitles

Input

Parameter	Type	Default	Description
`transcript`	`list[dict]`	required	Word-level transcript from transcription
`media_path`	`string`	required	Video to burn subtitles into
`subtitle_max_chars`	`int`	`40`	Max characters per subtitle line

Output

{
  "subtitles": [
    {"start": 0.0, "end": 2.5, "text": "Hello and welcome"}
  ],
  "subtitled_path": "/tmp/subtitled.mp4"
}

Transcode

Workflow: media/transcode

FFmpeg-based format and codec conversion with configurable quality.

fabric run media/transcode \
  --input media_path="/path/to/input.mov" \
  --input codec="libx265" \
  --input crf=20

Input

Parameter	Type	Default	Description
`media_path`	`string`	required	Input video path
`codec`	`string`	`"libx264"`	Video codec (`libx264`, `libx265`)
`format`	`string`	`"mp4"`	Output format
`crf`	`int`	`23`	Quality (lower = better, 0-51)
`preset`	`string`	`"medium"`	Speed/quality tradeoff (`ultrafast` to `veryslow`)
`resolution`	`string`	original	Target resolution (e.g., `"1920x1080"`)
`audio_codec`	`string`	`"aac"`	Audio codec
`audio_bitrate`	`string`	`"128k"`	Audio bitrate

Multi-Format Resize

Workflow: media/resize

Produces multiple aspect ratio variants for cross-platform distribution.

fabric run media/resize \
  --input media_path="/path/to/video.mp4" \
  --input formats='["1:1", "16:9", "9:16", "4:5"]' \
  --input strategy="pad"

Input

Parameter	Type	Default	Description
`media_path`	`string`	required	Input video
`formats`	`list[str]`	`["9:16"]`	Target aspect ratios
`strategy`	`string`	`"pad"`	Resize strategy: `pad` (blurred fill), `crop`, or `fit`

Output

{
  "resized_variants": [
    {"format": "1:1", "path": "/tmp/square.mp4", "dimensions": [1080, 1080]},
    {"format": "9:16", "path": "/tmp/vertical.mp4", "dimensions": [1080, 1920]}
  ]
}