Skip to content

research/problem-intelligence

Pipeline — DAG definition for the problem intelligence workflow.

Category: research
Source: workflows/research/problem_intelligence/pipeline.py

FieldTypeDefaultDescription
depthobject10Ingestion depth (int or ‘quick’/‘standard’/‘deep’/‘exhaustive’)
forum_urlsstring[]Forum URLs to ingest
nicheobjectContent niche to scope the search
platformsstring[]Platforms to query
querystring""Search query for problem discovery
reddit_discoverybooleanfalseAuto-discover relevant subreddits
regenerateobjectWhen set, this run is a regeneration. Workflows may read direction / keep / extra_instructions to modulate prompts; the engine persists parent_run_id and parent_variant_index as run lineage columns.
subredditsstring[]Specific subreddits to crawl
variantsinteger1Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs.
FieldTypeDefaultDescription
ideasobject[]Generated product/content ideas
kindobjectVariant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout.
platforms_queriedstring[]Platforms that were queried
querystring""Query used for ingestion
ranked_clustersobject[]Problem clusters ranked by opportunity score
scopestring"global"Analysis scope
specsobject[]Generated specifications
total_sources_collectedinteger0Total source documents collected
workflowstring"research/problem-intelligence"Workflow identifier
plan_ingestion → ingest_reddit → ingest_hackernews → ingest_youtube_comments → ingest_youtube_shorts → ingest_tiktok → ingest_instagram → ingest_twitter → ingest_forums → merge_sources → extract_problems → normalize_problems → cluster_problems → aggregate_solutions → compute_scores → generate_ideas → score_ideas → generate_specs → generate_workflow_defs → format_output
TaskDescription
plan_ingestionValidate input and determine scope + active platforms.
ingest_redditIngest posts + comments from Reddit. Three modes:
ingest_hackernewsIngest stories + comments from Hacker News via Algolia API.
ingest_youtube_commentsIngest YouTube video comments via yt-dlp or YouTube Data API.
ingest_youtube_shortsIngest YouTube Shorts + comments via yt-dlp (Shorts-specific search).
ingest_tiktokIngest TikTok videos + comments via yt-dlp or web search fallback.
ingest_instagramIngest Instagram posts via web search + Jina Reader, or Graph API if available.
ingest_twitterIngest tweets via Twitter API v2 (if available) or web search fallback.
ingest_forumsIngest forum content via Jina Reader (clean markdown extraction).
merge_sourcesJoin all platform branches into a unified source_documents list.
extract_problemsExtract problems/pain points from source documents using LLM.
normalize_problemsNormalize problem mentions and merge near-duplicates.
cluster_problemsCluster extracted problem mentions by semantic similarity, then refine with LLM.
aggregate_solutionsAggregate existing_solutions from problem mentions into SolutionMention entities.
compute_scoresScore problem clusters by opportunity potential.
generate_ideasGenerate 1-3 startup/product ideas per top problem cluster.
score_ideasScore generated ideas by viability.
generate_specsGenerate implementation-ready SaaS specifications for top ideas.
generate_workflow_defsOptionally convert SaaS specs into Fabric-compatible workflow DAG definitions.
format_outputStructure the final output, stripping internal keys.

Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.

workflow: research/problem-intelligence
# Optional fields — copy any line(s) under `input:` and uncomment to set:
# Ingestion depth (int or 'quick'/'standard'/'deep'/'exhaustive')
# depth: 10
#
# Forum URLs to ingest
# forum_urls: []
#
# Content niche to scope the search
# niche: null
#
# Platforms to query
# platforms: []
#
# Search query for problem discovery
# query: ""
#
# Auto-discover relevant subreddits
# reddit_discovery: false
#
# Specific subreddits to crawl
# subreddits: []
#
input: {}

Run it locally:

Terminal window
fab-workflow --from-file my-run.yaml

Or submit over the wire — the same file is the request body:

Terminal window
curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=research/problem-intelligence' \
-H 'Authorization: Bearer fab_xxx' \
-H 'content-type: application/yaml' \
--data-binary @my-run.yaml

Every workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).

  • Task merge_sources has no Pydantic types — contract is opaque to consumers.