research/problem-intelligence

Pipeline — DAG definition for the problem intelligence workflow.

Category: research
Source: workflows/research/problem_intelligence/pipeline.py

Input Schema

Field	Type	Default	Description
`depth`	`object`	`10`	Ingestion depth (int or ‘quick’/‘standard’/‘deep’/‘exhaustive’)
`forum_urls`	`string[]`	—	Forum URLs to ingest
`niche`	`object`	—	Content niche to scope the search
`platforms`	`string[]`	—	Platforms to query
`query`	`string`	`""`	Search query for problem discovery
`reddit_discovery`	`boolean`	`false`	Auto-discover relevant subreddits
`regenerate`	`object`	—	When set, this run is a regeneration. Workflows may read `direction` / `keep` / `extra_instructions` to modulate prompts; the engine persists `parent_run_id` and `parent_variant_index` as run lineage columns.
`subreddits`	`string[]`	—	Specific subreddits to crawl
`variants`	`integer`	`1`	Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.

Output Schema

Field	Type	Default	Description
`ideas`	`object[]`	—	Generated product/content ideas
`kind`	`object`	—	Variant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout.
`platforms_queried`	`string[]`	—	Platforms that were queried
`query`	`string`	`""`	Query used for ingestion
`ranked_clusters`	`object[]`	—	Problem clusters ranked by opportunity score
`scope`	`string`	`"global"`	Analysis scope
`specs`	`object[]`	—	Generated specifications
`total_sources_collected`	`integer`	`0`	Total source documents collected
`workflow`	`string`	`"research/problem-intelligence"`	Workflow identifier

Task Pipeline

plan_ingestion → ingest_reddit → ingest_hackernews → ingest_youtube_comments → ingest_youtube_shorts → ingest_tiktok → ingest_instagram → ingest_twitter → ingest_forums → merge_sources → extract_problems → normalize_problems → cluster_problems → aggregate_solutions → compute_scores → generate_ideas → score_ideas → generate_specs → generate_workflow_defs → format_output

Task	Description
`plan_ingestion`	Validate input and determine scope + active platforms.
`ingest_reddit`	Ingest posts + comments from Reddit. Three modes:
`ingest_hackernews`	Ingest stories + comments from Hacker News via Algolia API.
`ingest_youtube_comments`	Ingest YouTube video comments via yt-dlp or YouTube Data API.
`ingest_youtube_shorts`	Ingest YouTube Shorts + comments via yt-dlp (Shorts-specific search).
`ingest_tiktok`	Ingest TikTok videos + comments via yt-dlp or web search fallback.
`ingest_instagram`	Ingest Instagram posts via web search + Jina Reader, or Graph API if available.
`ingest_twitter`	Ingest tweets via Twitter API v2 (if available) or web search fallback.
`ingest_forums`	Ingest forum content via Jina Reader (clean markdown extraction).
`merge_sources`	Join all platform branches into a unified source_documents list.
`extract_problems`	Extract problems/pain points from source documents using LLM.
`normalize_problems`	Normalize problem mentions and merge near-duplicates.
`cluster_problems`	Cluster extracted problem mentions by semantic similarity, then refine with LLM.
`aggregate_solutions`	Aggregate existing_solutions from problem mentions into SolutionMention entities.
`compute_scores`	Score problem clusters by opportunity potential.
`generate_ideas`	Generate 1-3 startup/product ideas per top problem cluster.
`score_ideas`	Score generated ideas by viability.
`generate_specs`	Generate implementation-ready SaaS specifications for top ideas.
`generate_workflow_defs`	Optionally convert SaaS specs into Fabric-compatible workflow DAG definitions.
`format_output`	Structure the final output, stripping internal keys.

Run-spec example

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: research/problem-intelligence

# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Ingestion depth (int or 'quick'/'standard'/'deep'/'exhaustive')
#   depth: 10
#
# Forum URLs to ingest
#   forum_urls: []
#
# Content niche to scope the search
#   niche: null
#
# Platforms to query
#   platforms: []
#
# Search query for problem discovery
#   query: ""
#
# Auto-discover relevant subreddits
#   reddit_discovery: false
#
# Specific subreddits to crawl
#   subreddits: []
#

input: {}

Run it locally:

fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=research/problem-intelligence' \
  -H 'Authorization: Bearer fab_xxx' \
  -H 'content-type: application/yaml' \
  --data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).

Warnings

Task merge_sources has no Pydantic types — contract is opaque to consumers.