research/problem-intelligence
Pipeline — DAG definition for the problem intelligence workflow.
Category: research
Source: workflows/research/problem_intelligence/pipeline.py
Input Schema
Section titled “Input Schema”| Field | Type | Default | Description |
|---|---|---|---|
depth | object | 10 | Ingestion depth (int or ‘quick’/‘standard’/‘deep’/‘exhaustive’) |
forum_urls | string[] | — | Forum URLs to ingest |
niche | object | — | Content niche to scope the search |
platforms | string[] | — | Platforms to query |
query | string | "" | Search query for problem discovery |
reddit_discovery | boolean | false | Auto-discover relevant subreddits |
regenerate | object | — | When set, this run is a regeneration. Workflows may read direction / keep / extra_instructions to modulate prompts; the engine persists parent_run_id and parent_variant_index as run lineage columns. |
subreddits | string[] | — | Specific subreddits to crawl |
variants | integer | 1 | Number of independent variant executions (1–10). When > 1, the engine runs the workflow N times with different sampling, producing N outputs. |
Output Schema
Section titled “Output Schema”| Field | Type | Default | Description |
|---|---|---|---|
ideas | object[] | — | Generated product/content ideas |
kind | object | — | Variant card shape: video / carousel / image / text. Surfaced on the per-variant entry of the run-output API and used by gallery UIs to pick the right layout. |
platforms_queried | string[] | — | Platforms that were queried |
query | string | "" | Query used for ingestion |
ranked_clusters | object[] | — | Problem clusters ranked by opportunity score |
scope | string | "global" | Analysis scope |
specs | object[] | — | Generated specifications |
total_sources_collected | integer | 0 | Total source documents collected |
workflow | string | "research/problem-intelligence" | Workflow identifier |
Task Pipeline
Section titled “Task Pipeline”plan_ingestion → ingest_reddit → ingest_hackernews → ingest_youtube_comments → ingest_youtube_shorts → ingest_tiktok → ingest_instagram → ingest_twitter → ingest_forums → merge_sources → extract_problems → normalize_problems → cluster_problems → aggregate_solutions → compute_scores → generate_ideas → score_ideas → generate_specs → generate_workflow_defs → format_output| Task | Description |
|---|---|
plan_ingestion | Validate input and determine scope + active platforms. |
ingest_reddit | Ingest posts + comments from Reddit. Three modes: |
ingest_hackernews | Ingest stories + comments from Hacker News via Algolia API. |
ingest_youtube_comments | Ingest YouTube video comments via yt-dlp or YouTube Data API. |
ingest_youtube_shorts | Ingest YouTube Shorts + comments via yt-dlp (Shorts-specific search). |
ingest_tiktok | Ingest TikTok videos + comments via yt-dlp or web search fallback. |
ingest_instagram | Ingest Instagram posts via web search + Jina Reader, or Graph API if available. |
ingest_twitter | Ingest tweets via Twitter API v2 (if available) or web search fallback. |
ingest_forums | Ingest forum content via Jina Reader (clean markdown extraction). |
merge_sources | Join all platform branches into a unified source_documents list. |
extract_problems | Extract problems/pain points from source documents using LLM. |
normalize_problems | Normalize problem mentions and merge near-duplicates. |
cluster_problems | Cluster extracted problem mentions by semantic similarity, then refine with LLM. |
aggregate_solutions | Aggregate existing_solutions from problem mentions into SolutionMention entities. |
compute_scores | Score problem clusters by opportunity potential. |
generate_ideas | Generate 1-3 startup/product ideas per top problem cluster. |
score_ideas | Score generated ideas by viability. |
generate_specs | Generate implementation-ready SaaS specifications for top ideas. |
generate_workflow_defs | Optionally convert SaaS specs into Fabric-compatible workflow DAG definitions. |
format_output | Structure the final output, stripping internal keys. |
Run-spec example
Section titled “Run-spec example”Save the YAML below as my-run.yaml, edit the values, and run with the CLI or POST it to the API. Required fields are uncommented; optional knobs are documented above the input: block — copy any line under input: and uncomment to set.
workflow: research/problem-intelligence
# Optional fields — copy any line(s) under `input:` and uncomment to set:# Ingestion depth (int or 'quick'/'standard'/'deep'/'exhaustive')# depth: 10## Forum URLs to ingest# forum_urls: []## Content niche to scope the search# niche: null## Platforms to query# platforms: []## Search query for problem discovery# query: ""## Auto-discover relevant subreddits# reddit_discovery: false## Specific subreddits to crawl# subreddits: []#
input: {}Run it locally:
fab-workflow --from-file my-run.yamlOr submit over the wire — the same file is the request body:
curl -X POST 'https://gofabric.dev/v1/workflows/runs?name=research/problem-intelligence' \ -H 'Authorization: Bearer fab_xxx' \ -H 'content-type: application/yaml' \ --data-binary @my-run.yamlEvery workflow also accepts the universal WorkflowInput fields — variants (1–10 fan-out) and regenerate (creative-direction hints with run lineage). See Run-specs (YAML / TOML / JSON) for the full top-level shape (metadata, priority, bundle, parent, etc.).
Warnings
Section titled “Warnings”- Task
merge_sourceshas no Pydantic types — contract is opaque to consumers.