global/vision-analyze

Vision analysis workflow — image understanding via Moondream (local) or Gemini (cloud).

Category: global
Source: workflows/ai/vision.py

Input Schema

Field	Type	Default	Description
`image_path`	`string`	required	Path to the image file to analyze.
`regenerate`	`object`	—	When set, this run is a regeneration. Workflows may read `direction` / `keep` / `extra_instructions` to modulate prompts; the engine persists `parent_run_id` and `parent_variant_index` as run lineage columns.
`variants`	`integer`	`1`	Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.
`vision_operations`	`string[]`	`["caption", "tags"]`	Operations to perform. Options: ‘caption’, ‘tags’, ‘detect:<object>’, ‘query:<question>’. Example: [‘caption’, ‘tags’, ‘detect:person’]
`vision_question`	`object`	—	Shorthand for a single visual QA question. Equivalent to adding ‘query:<question>’ to operations.

Output Schema

Field	Type	Default	Description
`answers`	`object[]`	—	Visual QA answers: [{question, answer}].
`caption`	`object`	—	Scene description / caption of the image.
`detections`	`object[]`	—	Object detections with bounding boxes (local Moondream only).
`kind`	`object`	—	Variant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout.
`model_used`	`string`	`""`	Vision model that was used for analysis.
`tags`	`string[]`	—	Searchable tags extracted from the image.

Task Pipeline

analyze_image → format_vision_output

Task	Description
`analyze_image`	Analyze an image using the configured vision model.
`format_vision_output`	Extract vision results into top-level output fields.

Run-spec example

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: global/vision-analyze

# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Operations to perform. Options: 'caption', 'tags', 'detect:<object>', 'query:<question>'. Example: ['caption', 'tags', 'detect:person']
#   vision_operations: ["caption", "tags"]
#
# Shorthand for a single visual QA question. Equivalent to adding 'query:<question>' to operations.
#   vision_question: null
#

input:
  # Path to the image file to analyze.
  image_path: ""

Run it locally:

fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=global/vision-analyze' \
  -H 'Authorization: Bearer fab_xxx' \
  -H 'content-type: application/yaml' \
  --data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).