Skip to content

Media Workflows

Media workflows handle the core video and audio processing primitives — transcription, reframing, subtitles, and format conversion. They are used as building blocks by higher-level pipelines (AI Shorts, Clip Generator, YouTube Studio).

Workflow: media/transcription

Downloads video/audio from a URL or local path, then produces word-level transcripts via Faster Whisper.

Terminal window
# From YouTube
fabric run media/transcription --input url="https://youtube.com/watch?v=..."
# From local file
fabric run media/transcription --input media_path="/path/to/video.mp4"
download_media → transcribe_audio
ParameterTypeDefaultDescription
url / video_url / media_urlstringURL to download (HTTP or YouTube)
media_path / video_pathstringLocal file path
whisper_modelstring"large-v3"Faster Whisper model size
languagestringauto-detectLanguage code (e.g., "en")
{
"media_path": "/tmp/downloaded.mp4",
"full_text": "The complete transcribed text...",
"transcript": [
{
"start": 0.0,
"end": 2.5,
"text": "Hello and welcome",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.4, "probability": 0.98},
{"word": "and", "start": 0.5, "end": 0.6, "probability": 0.95},
{"word": "welcome", "start": 0.7, "end": 1.2, "probability": 0.97}
]
}
]
}

Workflow: media/reframe

Converts landscape (16:9) video to vertical (9:16) with intelligent subject tracking using MediaPipe face detection and YOLOv8 person detection.

Terminal window
fabric run media/reframe --input media_path="/path/to/landscape.mp4"
detect_and_track_subjects → crop_to_vertical
ParameterTypeDefaultDescription
media_pathstringrequiredPath to landscape video
{
"vertical_path": "/tmp/vertical_9x16.mp4",
"video_width": 1080,
"video_height": 1920,
"fps": 30,
"total_frames": 900,
"tracking_data": { "faces": [...], "persons": [...] }
}

Workflow: media/subtitles

Aligns word-level timestamps into subtitle chunks and burns them into the video using FFmpeg ASS rendering.

Terminal window
fabric run media/subtitles \
--input media_path="/path/to/video.mp4" \
--input subtitle_max_chars=40
align_words → render_subtitles
ParameterTypeDefaultDescription
transcriptlist[dict]requiredWord-level transcript from transcription
media_pathstringrequiredVideo to burn subtitles into
subtitle_max_charsint40Max characters per subtitle line
{
"subtitles": [
{"start": 0.0, "end": 2.5, "text": "Hello and welcome"}
],
"subtitled_path": "/tmp/subtitled.mp4"
}

Workflow: media/transcode

FFmpeg-based format and codec conversion with configurable quality.

Terminal window
fabric run media/transcode \
--input media_path="/path/to/input.mov" \
--input codec="libx265" \
--input crf=20
ParameterTypeDefaultDescription
media_pathstringrequiredInput video path
codecstring"libx264"Video codec (libx264, libx265)
formatstring"mp4"Output format
crfint23Quality (lower = better, 0-51)
presetstring"medium"Speed/quality tradeoff (ultrafast to veryslow)
resolutionstringoriginalTarget resolution (e.g., "1920x1080")
audio_codecstring"aac"Audio codec
audio_bitratestring"128k"Audio bitrate

Workflow: media/resize

Produces multiple aspect ratio variants for cross-platform distribution.

Terminal window
fabric run media/resize \
--input media_path="/path/to/video.mp4" \
--input formats='["1:1", "16:9", "9:16", "4:5"]' \
--input strategy="pad"
ParameterTypeDefaultDescription
media_pathstringrequiredInput video
formatslist[str]["9:16"]Target aspect ratios
strategystring"pad"Resize strategy: pad (blurred fill), crop, or fit
{
"resized_variants": [
{"format": "1:1", "path": "/tmp/square.mp4", "dimensions": [1080, 1080]},
{"format": "9:16", "path": "/tmp/vertical.mp4", "dimensions": [1080, 1920]}
]
}