Research Pipeline — Build Plan (Mar 14, 2026)

For the Coding Agent

You are building a multi-AI research pipeline inside EdgeClaw (TypeScript, Hono server, tsx runtime). The pipeline collects data, sends it to AI researchers, routes through analysts, judges, and produces trading predictions scored by Brier score.

Read these docs before writing any line of code:

Existing codebase to reuse:

Tech stack: TypeScript, Node.js, better-sqlite3, Hono (HTTP server), tsx runtime. Server: Oracle ARM/aarch64, 4 OCPUs, 23GB RAM, 194GB disk (152GB free). Vultr closing, Phoenix merged — EdgeClaw is now the single server. LLM routing: Direct API for Grok (xAI key). OpenRouter for DeepSeek, Qwen, Flash, Flash Lite, Gemini Pro, Sonar Pro. Cloudflare proxy for Gemini (geo-block bypass). Claude Sonnet/Opus via Anthropic key.


PHASE 0: Shared Infrastructure (Build First — Everything Depends on This)

These are shared modules that multiple desks need. Build once, use everywhere.

0.1 — Central Database Schema

Create a single SQLite database for the research pipeline: data/db/research-pipeline.db

Tables needed:

Each data collection desk will have its own tables (defined in their spec docs). But the above tables are universal.

0.2 — Prediction Market Adapter

One module for Kalshi + Polymarket API access. All desks share this.

Kalshi:

Polymarket:

Output: Unified market snapshot format regardless of source platform.

Collect: Prices, volume, order book depth, open interest. Per desk filter specs.

0.3 — FRED Data Connector

One module that pulls ALL FRED series used by any desk (Options, Stocks, Futures, Forex, Crypto).

Key series: SOFR, Treasury yields (DGS2, DGS10, DGS30), WALCL (Fed balance sheet), RRPONTSYD (Reverse Repo), BAMLH0A0HYM2 (HY spread), BAMLC0A0CM (IG spread), MOVE index, DXY.

Frequency: Daily pull at 6:30 AM ET. Store in shared table fred_data(series_id, date, value).

API key: Free registration at https://fred.stlouisfed.org/docs/api/api_key.html

0.4 — VIX/Volatility Suite

One module: VIX, VVIX, SKEW, VIX9D, VIX3M, VIX6M, VIX futures term structure.

Source: CBOE (free delayed data). Frequency: Every 15 min for VIX/futures during market hours. Daily for the rest. Store in: volatility_data(metric, timestamp, value)

0.5 — SEC EDGAR Scraper

One module for all SEC filing types used by Options + Stocks desks.

Filing types: Form 4, Form 144, 13F, 13D/13G, N-PORT, 10-K/10-Q (XBRL), CORRESP, 424B2. Method: EDGAR full-text search + RSS feeds. Free, no auth. Frequency: Daily scan at 6:30 PM ET (filings drop after market close). Store in: Per-filing-type tables. Note: 424B2 parsing requires NLP — defer to Phase 3.

0.6 — Odds/Sportsbook Scraper

One module for Pinnacle + FanDuel + DraftKings odds.

Primary: Firecrawl (self-hosted) or direct scraping. Fallback: The Odds API (key: f63a46439d104a3a78dee17580c96279). Rate limited: 500 calls/month. Quota manager: Track usage across all sports desks. Priority order: closing odds > early odds > mid-day refresh. FanDuel prop lines: Separate collection for player props anchor (see sports-desk-data-inventory.md Player Props section).

0.7 — LLM Router for Pipeline

Extend src/llm/client.ts to support all pipeline models via OpenRouter:

Add JSON schema enforcement (Zod validation) for all analyst outputs. Add retry logic for Sonnet (known JSON reliability issues — see auditions doc). Add cost tracking per call (already exists in audit logger, wire it up).

0.8 — Pipeline File System

Create the folder structure defined in memory/research-pipeline-filesystem.md. Per-desk folders, researcher outputs separated (grok/sonar), analyst outputs separated (analyst-1/2/3).


PHASE 1: Data Collection — Sports (Prove the Architecture)

Build ONE desk end-to-end first. Sports (NHL/NBA/NCAAB) is the best candidate — it has existing data on Vultr, the most mature spec, and the most frequent runs (daily + live).

1.1 — Sports Data Collectors

Implement collectors from sports-desk-data-inventory.md:

1.2 — Research Module

The core pipeline runner. For a given desk:

  1. Call Grok (Researcher #1) with desk search strategy queries
  2. Call Sonar Pro (Researcher #2) with desk search strategy queries
  3. Save raw research to filesystem per research-pipeline-filesystem.md
  4. Call Data Validator (Flash Lite) to pre-filter
  5. Save validated research

1.3 — Analyst Module

Route validated research to 3 desk analysts:

  1. Look up desk's analyst assignments from config (from pipeline-jobs.md roster)
  2. Send research + desk prompt to each analyst model
  3. Enforce JSON schema output (Zod)
  4. Collect structured predictions with conviction scores
  5. Save analyst outputs

1.4 — Judge Module (Gemini)

Send analyst outputs to both Gemini judges:

  1. Format Opus briefing spec (from research-pipeline.md)
  2. Send to Memory Gemini (@GGAnalystBot) — via Telegram relay OR direct API
  3. Send to Wiped Gemini (@GGWipedBot) — via Telegram relay OR direct API
  4. Collect grades, evidence folders, long shot folders
  5. Track Gemini disagreements

Decision needed: Are Gemini judges called via Telegram relay (Boss forwards prompts to bots in EDGE TEAM group) or direct Gemini API? The pipeline design says Boss relays. If direct API, use Cloudflare proxy.

1.5 — Judge Module (Opus)

Same pattern as Gemini judges:

  1. Send evidence folders to Wiped Opus (@CCWipedBot) and Memory Opus (@ClaudeAnalystBot)
  2. If they agree: done. Save verdict.
  3. If they disagree: send each other's reasoning for deliberation round
  4. If still disagree: alert Boss via Telegram
  5. Track Opus disagreements

Same relay question as Gemini judges.

1.6 — Settlement Module

  1. Cron: hourly for intraday, daily for swing, weekly for long-term
  2. Check outcomes via Kalshi API, Polymarket API, sports score APIs
  3. Calculate Brier scores
  4. Update analyst/Gemini/Opus leaderboards
  5. Archive settled predictions with full chain

1.7 — Boss Notifications

Send morning briefing to EDGE TEAM Telegram group:


PHASE 2: Data Collection — Finance Desks

Once Sports is working end-to-end, clone the architecture to finance desks. These share a lot of infrastructure from Phase 0.

2.1 — Options Desk Collectors

From options-desk-data-inventory.md:

2.2 — Stocks Desk Collectors

From stocks-desk-data-inventory.md:

2.3 — Futures Desk Collectors

From futures-desk-data-inventory.md:

2.4 — Crypto Desk Collectors

From crypto-desk-data-inventory.md:

2.5 — Forex Desk Collectors

From forex-data-collection.md:

2.6 — Connect All Finance Desks to Pipeline

Wire each desk into the Research Module → Analyst → Judge → Settlement flow built in Phase 1.


PHASE 3: Remaining Desks + Advanced Features

3.1 — Weather Desk

From weather-lock-in-analysis.md:

3.2 — Sports Sub-Desks

3.3 — Research-Only Desks

These don't need data collectors — they use research pipeline searches + other desks' data:

3.4 — Advanced Data Sources

3.5 — Code Pipeline Update

Update src/core/code-pipeline.ts to new audition-winner flow:

3.6 — Self-Improving Prompts

Per pipeline design:


PHASE 4: Dashboard, Monitoring, Optimization

4.1 — Performance Dashboard

4.2 — Automated Alerts

4.3 — Weekly OpusGodBot Review

Wire up @OpusGodBot for weekly Sunday review:


DEPENDENCIES MAP

Phase 0 (Infrastructure)
  ├── 0.1 Database ─────────────────┐
  ├── 0.2 Prediction Market Adapter ─┤
  ├── 0.3 FRED Connector ───────────┤
  ├── 0.4 VIX Suite ────────────────┤
  ├── 0.5 SEC EDGAR ────────────────┤── All needed before any desk
  ├── 0.6 Odds Scraper ─────────────┤
  ├── 0.7 LLM Router ──────────────┤
  └── 0.8 File System ─────────────┘
                │
Phase 1 (Sports — prove it works)
  ├── 1.1 Sports Collectors ────────┐
  ├── 1.2 Research Module ──────────┤
  ├── 1.3 Analyst Module ───────────┤── Full end-to-end for 1 desk
  ├── 1.4 Gemini Judge Module ──────┤
  ├── 1.5 Opus Judge Module ────────┤
  ├── 1.6 Settlement Module ────────┤
  └── 1.7 Boss Notifications ──────┘
                │
Phase 2 (Finance desks — clone + customize)
  ├── 2.1-2.5 Desk-specific collectors
  └── 2.6 Wire into Phase 1 pipeline
                │
Phase 3 (Everything else)
  ├── 3.1-3.2 Weather + Sports sub-desks
  ├── 3.3 Research-only desks
  ├── 3.4 Advanced data sources
  ├── 3.5 Code pipeline update
  └── 3.6 Self-improving prompts
                │
Phase 4 (Polish)
  ├── 4.1 Dashboard
  ├── 4.2 Alerts
  └── 4.3 Weekly review

DEFERRED ITEMS (Come Back Later)

REFERENCE DOCS INDEX

Doc Location What It Covers
Pipeline design memory/research-pipeline.md Master blueprint
Team roster memory/research-pipeline-jobs.md Model assignments + budget
Auditions memory/pipeline-auditions.md Why each model was chosen
File system memory/research-pipeline-filesystem.md Folder structure
Research briefs memory/research-briefs.md Opus briefing format
Desk categories memory/desk-categories.md Evidence categories
Search strategies (main) memory/research-search-strategies.md Sports + 7 other desks
Search strategies (forex) memory/research-search-strategies-forex.md Forex queries
Search strategies (options) memory/research-search-strategies-options.md Options queries
Search strategies (stocks) memory/research-search-strategies-stocks.md Stocks queries
Search strategies (futures) memory/research-search-strategies-futures.md Futures queries
Search strategies (weather) memory/research-search-strategies-weather.md Weather queries
Search strategies (arbitrage) memory/research-search-strategies-arbitrage.md Arbitrage queries
Search strategies (AI/tools) memory/research-search-strategies-ai-tools.md AI tools queries
Search strategies (player props) memory/research-search-strategies-player-props.md Player props
Search strategies (DFS) memory/research-search-strategies-dfs.md DFS queries
Sports data spec memory/sports-desk-data-inventory.md NHL/NBA/NCAAB collection
Soccer data spec memory/soccer-desk-data-inventory.md Soccer collection
MLB data spec memory/mlb-desk-data-inventory.md Baseball collection
UFC data spec memory/ufc-desk-data-inventory.md MMA collection
Options data spec memory/options-desk-data-inventory.md Options collection
Stocks data spec memory/stocks-desk-data-inventory.md Stocks collection
Futures data spec memory/futures-desk-data-inventory.md Futures collection
Crypto data spec memory/crypto-desk-data-inventory.md Crypto collection
Forex data spec memory/forex-data-collection.md Forex monitoring
Weather data spec memory/weather-lock-in-analysis.md Weather strategy
Code pipeline memory/pipeline-auditions.md (Code Pipeline Flow) Flash→Flash→Opus
Chat brain restructure memory/edgeclaw-chat-brain.md Chat cost reduction
System upgrades memory/research-upgrades.md Post-build improvements
Source: ~/.claude/projects/-home-ubuntu-edgeclaw/memory/build-plan.md