Futures Desk -- Data Collection Spec (MERGED FINAL)

Base: Claude Opus 4.6 (complete, unmodified) Panel additions cherry-picked from: Gemini 3.1 Pro, Grok 4.1, Grok 4 Fast, DeepSeek V3.1, GPT-4.1, Sonar Reasoning Pro Merged: Mar 15, 2026


What This Document Is

This is the complete data collection specification for the Futures desk in the research pipeline. It covers commodity, energy, metals, agriculture, and financial futures traded on direct futures exchanges (CME/NYMEX/ICE) as the primary venue and prediction markets (Kalshi/Polymarket) as a secondary venue for commodity/economic binary contracts. An AI builder should be able to read this and know exactly what data to collect, from where, how often, and why.

The Business Model (Two Venues)

Primary venue: Direct futures trading via a broker (CME/NYMEX/ICE/CBOE). Full liquidity, tight spreads, ladder order books with real depth. This is where size gets deployed.

Secondary venue: Kalshi and Polymarket offer binary contracts on commodity prices, economic data releases (CPI, NFP, GDP), Fed rate decisions, and Treasury yields. These platforms have their own ladder/CLOB structures but with thinner liquidity. The same data infrastructure powers both -- the difference is execution venue and position sizing.

The edge: Government agencies publish enormous amounts of free data that moves futures markets. Most retail traders watch headline numbers. We parse the sub-components, cross-reference positioning data (CFTC COT), track term structure shifts, and monitor alternative data (satellite, ship tracking) to identify mispricing hours to weeks before the market corrects.

IMPORTANT -- Prediction Market Arbitrage Caveat [PANEL: Sonar Reasoning Pro]: Cross-venue arb between Kalshi and Polymarket for the same underlying (e.g., BLS CPI) is NOT reliably tradeable. Spreads of 5-8 cents look attractive but after Kalshi trading fees (~0.5% one-way), Polymarket AMM slippage (1-2% in mid-cap markets), and funding costs, you are often underwater. Use Kalshi/Polymarket prices as market sentiment proxies and fundamental basis trade signals (our model says 60% probability, market prices 45% = edge to trade the underlying futures, not necessarily the binary). Cross-venue arb only when spreads exceed 10 cents on high-volume markets with demonstrable execution cost below the spread.

How Futures Are Different From Stocks/Options/Sports

An AI builder must understand these structural differences:

  1. Physical Delivery Matters -- Unlike stocks, most commodity futures can be settled by actual physical delivery of barrels of oil, bushels of corn, or bars of gold. This creates real-world supply bottlenecks (warehouse stocks, pipeline capacity, port congestion) that directly affect prices. Inventory data is the single most important input.

  2. Term Structure Is the Edge -- Futures trade in monthly contracts stretching years into the future. The shape of the price curve (contango = future months more expensive; backwardation = future months cheaper) reveals the market's supply/demand expectations. Changes in curve shape often signal price moves before the front-month price reacts.

  3. Government Data Drives Everything -- Unlike stocks where earnings matter, commodity futures are driven by government reports: EIA petroleum inventories (Wednesdays), EIA natural gas storage (Thursdays), USDA WASDE crop reports (monthly), CFTC positioning data (Fridays), BLS inflation data (monthly). These are all free. The release schedule is the trading calendar.

  4. Seasonality Is Structural -- Natural gas demand peaks in winter (heating) and summer (cooling). Crop prices follow planting and harvest cycles. Gasoline demand peaks in summer driving season. These patterns repeat every year with measurable hit rates and are baked into the curve shape.

  5. Positioning Data Is Public -- The CFTC publishes weekly data showing exactly how commercial hedgers, swap dealers, and speculative funds are positioned. When speculators hit extreme positions (>2 standard deviations), reversals happen. This data is free and has no equivalent in stock markets.

  6. Cross-Commodity Relationships Are Tradeable -- Commodities exist in supply chains: crude oil becomes gasoline (crack spread), soybeans become meal and oil (crush spread), natural gas generates electricity (spark spread). These processing margins mean-revert. Ratio trades between related commodities (gold/silver, copper/gold, corn/wheat) have decades of statistical backing.


SECTION A: PREDICTION MARKET DATA (Secondary Venue)

1. Kalshi Commodity/Macro Markets

What: Every active Kalshi market related to commodities, economic data, and monetary policy.

Markets to track:

What to pull per market:

Source: Kalshi API (https://trading-api.kalshi.com/trade-api/v2) Auth: RSA-PSS signed requests (existing Kalshi Edge Bot code handles this). Use Kalshi testnet first to validate auth before connecting to production. Consider websocket connections for real-time data to reduce REST API call volume. [PANEL: Grok 4 Fast] Rate limits: Not publicly documented -- implement exponential backoff, alert if >1 silent failure per hour. [PANEL: DeepSeek V3.1] Collection frequency: Every 15 minutes during US market hours (9:30 AM - 4:00 PM ET) for high-conviction markets only. Hourly for all others. Snapshot at 8:00 AM ET (pre-market). Scale back to hourly across the board if rate limiting is detected. Fallback: Parse Kalshi web UI (slower) if API unavailable; skip Kalshi and use Polymarket only as last resort. [PANEL: Sonar Reasoning Pro]

2. Polymarket Commodity/Macro Markets

What: Same categories as Kalshi. Different liquidity pool (crypto-native traders), so prices often diverge.

What to pull per market:

Source: Polymarket CLOB API (https://clob.polymarket.com/) -- free, no auth needed for read. Stability note: Crypto platforms change APIs frequently. Implement with exponential backoff; fallback to web scraping if API changes. [PANEL: Grok 4 Fast] Collection frequency: Same as Kalshi -- every 15 min during market hours for active markets, hourly otherwise. Fallback: Polymarket web UI or manual order book read every hour if CLOB API unavailable. [PANEL: Sonar Reasoning Pro]

Execution feasibility layer [PANEL: Sonar Reasoning Pro]: Before sizing any prediction market trade, calculate true edge after execution costs:

interface LiquidityAssessment {
  contract: string;
  venue: 'kalshi' | 'polymarket' | 'futures';
  desiredSize: number;
  estimatedSlippage: number;   // % (Kalshi ~0.5-1%, Polymarket ~1-3%)
  minConvictionThreshold: number;  // only trade if conviction score > this
  recommendedSize: number;    // max position after slippage drag
}
// Examples:
// CL (WTI crude): $500K, 0.08% slippage, conviction threshold 50
// Kalshi CPI binary: $50K desired, 2.5% slippage, conviction threshold 80, recommended $15K

SECTION B: ENERGY (EIA Is Primary -- All Free)

3. EIA Weekly Petroleum Status Report

What: The single most important weekly energy data release. Crude oil inventories, production, imports, and refinery utilization at the PADD (regional) level. Moves crude and product prices instantly on publication.

Source: EIA API (https://api.eia.gov/v2/) API Key: Free -- register at https://www.eia.gov/opendata/register.php Rate limit: 1000 requests/hour.

What to pull:

Key API endpoints:

Collection frequency: Weekly. Report released Wednesdays 10:30 AM ET (delayed 1 day if Monday holiday). Pull immediately on release. Store historical for 5-year seasonal comparison. Fallback: EIA CSV download (weekly) if API unavailable; parse EIA website directly as last resort. [PANEL: Sonar Reasoning Pro]

Data quality rules [PANEL: Sonar Reasoning Pro]:

4. EIA Natural Gas Storage Report

What: Weekly injection/withdrawal data for underground natural gas storage. The second most market-moving weekly report.

Source: EIA API -- GET /v2/natural-gas/stor/wkly/data/ What to pull:

Collection frequency: Weekly. Released Thursdays 10:30 AM ET. Fallback: EIA website HTML table scraper.

5. EIA Short-Term Energy Outlook (STEO)

What: Monthly 2-year forecast of US and global energy supply, demand, prices, and production.

Source: EIA website (https://www.eia.gov/outlooks/steo/) -- downloadable tables in Excel/CSV. What to pull:

Collection frequency: Monthly, released ~10th of each month.

6. EIA Drilling Productivity Report (DPR)

What: Monthly report on drilling efficiency and DUC (drilled but uncompleted) well count across 7 major US shale basins.

Source: EIA (https://www.eia.gov/petroleum/drilling/) -- downloadable spreadsheets. What to pull:

Collection frequency: Monthly.

7. Baker Hughes Rig Count

What: Weekly count of active drilling rigs in the US and Canada by basin and target (oil vs gas).

Source: Baker Hughes (https://rigcount.bakerhughes.com/) -- free weekly PDF/Excel. Use EIA's incorporated copy of Baker Hughes data for API access (EIA incorporates it with short lag, cleaner than scraping BH directly). [PANEL: Opus review] What to pull:

Collection frequency: Weekly, released Fridays 1:00 PM ET.

8. OPEC Monthly Oil Market Report (MOMR)

What: OPEC's own assessment of global supply, demand, and production compliance.

Source: OPEC website (https://www.opec.org/opec_web/en/publications/338.htm) -- free PDF/tables. What to pull:

Collection frequency: Monthly, typically released mid-month.

9. Crack Spreads (Calculated -- $0)

What: Processing margins that measure refinery profitability. Calculated from freely available price data.

Spreads to calculate:

Source: Calculate from EIA spot prices or Polygon.io futures (preferred over Yahoo Finance). What to store: Absolute value, Z-score vs 1/3/5-year history, percentile rank, seasonal adjustment. Collection frequency: Daily at market close. Formula validation test [PANEL: Sonar Reasoning Pro]:

// Unit test to validate crack spread formula
test('crack_spread_calculation', () => {
  const wti = 70;    // $/barrel
  const rbob = 2.10; // $/gallon
  const ho = 2.00;   // $/gallon
  // 3-2-1 crack spread -- historically $10-40/bbl
  const crackSpread = (2 * rbob * 42 + 1 * ho * 42) / 3 - wti;
  // Must be positive (refineries profitable) and in $10-40 range
  expect(crackSpread).toBeGreaterThan(0);
  expect(crackSpread).toBeLessThan(60);
});

10. Henry Hub, WTI, Brent Daily Settlement Prices

What: Daily settlement prices for the three benchmark energy contracts.

Source: Polygon.io (preferred -- $29/month, shared subscription, reliable). FRED for official daily spot fallback (DCOILWTICO, DHHNGSP, DCOILBRENTEU). Yahoo Finance (CL=F, NG=F, BZ=F) as last resort only -- ticker formats change monthly and data has 15-min delay. Collection frequency: Daily at market close. Store full history for seasonal analysis.


SECTION C: METALS

11. COMEX/LME/SHFE Warehouse Stocks

What: Exchange-monitored warehouse inventory levels for deliverable metals. The physical supply barometer.

What to pull per exchange:

Collection frequency: Daily for COMEX/LME. Weekly for SHFE.

12. Central Bank Gold Purchases

What: How much gold central banks (especially China, India, Turkey, Poland) are buying or selling.

Source: World Gold Council (https://www.gold.org/goldhub/data/gold-reserves-by-country) -- quarterly report, free summary data. IMF IFS database (https://data.imf.org/) for monthly reserve changes. Collection frequency: Monthly (IMF data) and quarterly (WGC comprehensive).

13. Metal Ratios (Calculated -- $0)

What: Key ratios between related metals that reveal macro regime and mean-revert.

Ratios to calculate daily:

Source: Polygon.io preferred; FRED (GOLDAMGBD228NLBM, SLVPRUSD) as fallback. What to store: Ratio value, 20/60/200-day moving averages, Z-score vs 1/3/5-year history, percentile rank. Collection frequency: Daily at market close.

14. SHFE-LME Copper Arbitrage Window

What: The price spread between Shanghai (SHFE) and London (LME) copper, adjusted for shipping, tariffs, and VAT.

Source: Calculate from SHFE and LME daily prices (use Investing.com/TradingView for SHFE data rather than scraping SHFE directly). What to track:

Collection frequency: Daily.


SECTION D: AGRICULTURE (USDA Is Primary -- All Free)

15. USDA WASDE Report

What: World Agricultural Supply and Demand Estimates. The single most important monthly report for grain, oilseed, cotton, and livestock markets. Updated the ~10th of each month.

Source: USDA PSD Online API (https://apps.fas.usda.gov/PSDOnlineV2/api/) -- free, no key needed, returns JSON. Rate limit: ~100 requests/hour; build retry logic with exponential backoff. Cache aggressively -- WASDE data only changes monthly, so once pulled, do not re-pull until the next release date. [PANEL: Opus review] Fallback: USDA PDF table parser if API has multi-day outages.

What to pull per commodity (corn, soybeans, wheat, cotton, rice, sugar):

Collection frequency: Monthly, released ~10th of each month at 12:00 PM ET. Pull immediately on release.

Revision handling [PANEL: Sonar Reasoning Pro]: When USDA revises WASDE backwards, update the stored data but do NOT fire a new signal. Flag as revision_type = 'backward_revision' in the database. Treat as noise unless it confirms a multi-month trend.

16. USDA Crop Progress Report

What: Weekly update on planting progress, crop condition ratings, and harvest progress. Published during the growing season (April-November).

Source: USDA NASS QuickStats API (https://quickstats.nass.usda.gov/api/api_GET/) -- free, requires API key. What to pull:

Collection frequency: Weekly, released Mondays 4:00 PM ET during growing season.

17. USDA Export Sales and Export Inspections

What: Weekly data on US agricultural export commitments (sales) and actual shipments (inspections).

Source:

What to pull:

Collection frequency: Weekly (Thursday for sales, Monday for inspections). FGIS daily during peak export season.

18. USDA Prospective Plantings

What: Survey of farmer planting intentions for the upcoming growing season. Released once a year at the end of March.

Source: USDA NASS and QuickStats API. What to pull:

Collection frequency: Annual (late March), with June Acreage update.

19. USDA Cattle on Feed / Hogs and Pigs

What: Monthly inventory reports for livestock. Published ~20th of each month.

Source: USDA NASS via QuickStats API. What to pull:

Collection frequency: Monthly.

20. USDA Cold Storage

What: Monthly frozen meat and poultry inventory in US cold storage warehouses.

Source: USDA NASS and QuickStats API. What to pull:

Collection frequency: Monthly, released ~22nd.

21. Brazil/Argentina Crop Estimates

What: South American production estimates from local government agencies. Brazil and Argentina are the #1 and #3 global soybean exporters.

Source (simplified approach): [PANEL: Grok 4 Fast]

What to pull:

Collection frequency: Monthly for USDA FAS/Conab; weekly from Argentine exchanges during Nov-Apr growing season.


SECTION E: TREASURIES AND RATES

22. Treasury Auction Results

What: Results of US Treasury debt auctions. The bond market equivalent of earnings reports.

Source: Treasury Fiscal Data API (https://api.fiscaldata.treasury.gov/services/api/fiscal_service/v1/accounting/od/auctions_query) -- free, returns JSON. What to pull:

Collection frequency: On auction days (schedule known months in advance). 2Y/5Y auctions typically Tuesday/Wednesday, 10Y/30Y on Wednesday/Thursday.

23. Treasury Yields Daily

What: Daily yields across the full maturity curve.

Source: FRED API (https://api.stlouisfed.org/fred/series/observations) -- free, API key required. Rate limit: 120 requests/minute. FRED Series IDs:

Calculated metrics:

Collection frequency: Daily. Fallback: FRED CSV download (weekly) if API unavailable; Yahoo Finance for yields (delayed but functional).

24. TIPS Breakeven Inflation Rates

What: The difference between nominal Treasury yields and TIPS (inflation-protected) yields. This IS the market's inflation expectation.

Source: FRED API -- T5YIE, T10YIE, T5YIFR. What to track:

Collection frequency: Daily.

25. Net Liquidity Signal (TGA + RRP)

What: The combined Treasury General Account balance and Reverse Repo Facility usage. Together, these measure how much cash is available to buy assets.

Source:

What to calculate:

Collection frequency: Daily for RRP. Weekly for TGA (WTREGEN). Weekly for balance sheet (WALCL).

26. CME FedWatch Implied Probabilities

What: Market-implied probabilities of Fed rate changes at each upcoming FOMC meeting, derived from Fed Funds futures pricing.

Source: CME Group FedWatch tool (https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html) -- free, scrapeable. Or calculate directly from 30-Day Fed Funds futures prices on FRED (FF1, FF2, etc.). What to track:

Collection frequency: Daily at market close. Extra snapshot after any major data release (CPI, NFP, GDP).

27. Fed SOMA Holdings

What: The Federal Reserve's System Open Market Account -- what the Fed actually owns on its balance sheet (Treasuries and MBS by maturity).

Source: NY Fed API -- free. Key metrics only: [PANEL: Opus review -- simplify from full maturity distribution]

Collection frequency: Weekly.

28. SOFR Rate

What: Secured Overnight Financing Rate. The benchmark short-term interest rate that replaced LIBOR.

Source: FRED API -- series SOFR (daily). Collection frequency: Daily.


SECTION F: FUTURES CURVES AND TERM STRUCTURE

29. Full Term Structure (Front 6 + 12th Month)

What: Settlement prices for the front 6 contract months + the 12th month endpoint for each major futures contract. Front 6 + 12th month is sufficient for 95% of term structure analysis; contracts 13+ have <1% volume. [PANEL: Grok 4 Fast]

Contracts to track:

Source: Polygon.io ($29/month, shared with Options desk) -- MANDATORY from Week 1, not Month 2. Yahoo Finance scraping will break; ticker formats change monthly and break curve calculations within 3-6 months. [ALL PANEL MEMBERS]

What to store per contract per day:

Open Interest price matrix [PANEL: Opus review -- calculated from already-collected data, $0 additional cost]:

Collection frequency: Daily at market close (6:00 PM ET).

Physical delivery notices [PANEL: Grok 4 Fast]:

30. Contango/Backwardation Classification

What: Daily classification of each commodity's term structure shape and how it is changing.

Calculate from curve data:

31. Roll Calendar

What: When each futures contract expires and rolls to the next month.

Source: CME Group contract specifications pages -- free. What to track:

Collection frequency: Static calendar, updated once per contract month. Volume crossover monitored daily during roll periods.

32. Calendar Spread Values

What: The price difference between specific contract months for the same commodity.

Key spreads to track:

What to store: Spread value, Z-score vs 1/3/5-year history, percentile rank, seasonal pattern. Collection frequency: Daily.


SECTION G: CFTC COMMITMENTS OF TRADERS (COT)

33. Disaggregated Report (Primary)

What: Weekly breakdown of futures positions by trader type across all major contracts. This is the most important positioning dataset in existence.

Trader categories:

Source: CFTC.gov -- Disaggregated Futures-Only report.

Also pull: Futures-and-Options Combined report [PANEL: Gemini 3.1 Pro]:

Also pull: ICE Commitments of Traders for Brent [PANEL: DeepSeek V3.1]:

What to pull per commodity (15 key contracts: CL, BZ, NG, RB, HO, GC, SI, HG, ZC, ZS, ZW, SB, KC, ZN, ES):

Collection frequency: Weekly. Released Friday 3:30 PM ET. Data is as of prior Tuesday close (3-day lag).

34. COT Z-Scores and Extreme Flags

What: Statistical context for raw COT positions. Raw numbers mean nothing without historical context.

Calculate for each trader category, each commodity:

CRITICAL interpretation guidance [PANEL: Sonar Reasoning Pro]: Managed money extremes are NOT automatically reversal signals. Managed money are trend-followers and STAY extreme for weeks/months during strong trends. The correct conviction scoring approach:

Managed money extremes alone:          5 points max (downgraded from primary signal)
Dealer flip:                          20 points (rare and real -- structural flow shift)
Commercial hedger flip:               15 points (physical supply/demand guys know the market)
Dealer-speculator divergence:         12 points (managed money short + dealers net long = fight)
Managed money extreme WITH dealer confirmation: 18 points (both agree = high conviction)

Managed money extreme WITHOUT dealer confirmation = crowded trade, high reversal RISK but not a standalone signal.

35. Dealer Flip Detection

What: Monitor for swap dealer net position changing sign (from net long to net short, or vice versa).

Detection rule: Swap dealer net position changes sign from prior week AND the Z-score of the new position is >0.5 (not just noise around zero). Flag immediately in daily alert.

Backtest requirement [PANEL: Sonar Reasoning Pro]: Before treating dealer flips as Tier-1 signals, validate that they precede major moves (2-6 week lag) in at least 60% of historical cases per commodity. If backtesting shows <60% correlation, demote to Tier 2. Track every dealer flip with its subsequent 4-week return.

Persistence requirement: A flip that reverses within 1 week is noise. Require the flip to persist for 2+ consecutive weeks before upgrading to high-conviction signal.

36. Legacy and TFF Reports (Supplementary)

What: Additional COT report formats.

Collection frequency: Weekly (same release schedule as disaggregated).


SECTION H: WEATHER AND SEASONAL

37. NOAA CPC Temperature and Precipitation Outlooks

What: The Climate Prediction Center's probabilistic temperature and precipitation forecasts for the next 6-10 and 8-14 days.

Source: NOAA CPC (https://www.cpc.ncep.noaa.gov/products/predictions/) Collection frequency: Updated daily by CPC.

Supplement with global ensemble models [PANEL: Grok 4 Fast]:

High-frequency weather alerts [PANEL: GPT-4.1]:

38. Heating/Cooling Degree Days (HDD/CDD)

What: Quantitative measure of energy demand driven by temperature deviation from 65F baseline.

Source: NOAA (https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/degree_days/) -- free. Also EIA includes degree day data in their weekly reports. What to track:

Collection frequency: Weekly (published with EIA gas storage). Daily forecasts from NWS.

39. US Drought Monitor

What: Weekly assessment of drought conditions across the continental US.

Source: https://droughtmonitor.unl.edu/DmData/DataDownload.aspx -- free CSV/Shapefile, updated weekly (Tuesdays). What to track:

Collection frequency: Weekly.

40. El Nino / La Nina Status

What: ENSO phase determines global weather patterns for 6-18 months.

Source: NOAA CPC ENSO page (https://www.cpc.ncep.noaa.gov/products/analysis_monitoring/enso_advisory/) -- free. What to track:

Collection frequency: Monthly (CPC updates monthly, with weekly SST data).

41. Hurricane Tracking (June-November)

What: Tropical storm and hurricane positions, forecasts, and intensity for the Atlantic basin.

Source: NOAA National Hurricane Center (https://www.nhc.noaa.gov/gis/) What to track:

Collection frequency: Every 6 hours during active storms. Daily scan during hurricane season (June 1 - November 30).

42. Seasonal Patterns with Hit Rates

What: Historical seasonal price patterns for each commodity, with statistical validation.

Calculate from 10-20 years of price data:

IMPORTANT -- Regime-conditional seasonality [PANEL: Sonar Reasoning Pro]: Raw seasonal hit rates are insufficient. Crowded patterns (summer crude driving demand) are already in the curve. Use regime-conditional approach:

interface SeasonalPattern {
  commodity: string;
  season: string;
  regimes: {
    'supply_tight': { hitRate: number; avgReturn: number };
    'supply_ample': { hitRate: number; avgReturn: number };
    'demand_surge': { hitRate: number; avgReturn: number };
  };
}
// Only trade seasonal if current regime historically supports it.
// Seasonal aligned with tight inventory + extreme managed money short = high conviction.
// Ignore seasonal if term structure contradicts it (natgas in contango during winter = demand destruction overrides seasonal).

Collection frequency: Calculate once, update annually.

43. NASA FIRMS Fire Data

What: Near-real-time fire/hotspot detection from satellite thermal sensors.

Source: NASA FIRMS (https://firms.modaps.eosdis.nasa.gov/) -- free. Data available within 3 hours of satellite pass; add 6-hour delay disclaimer for signals. [PANEL: DeepSeek V3.1] What to monitor:

Collection frequency: Daily during relevant seasons (May-Nov). Defer full implementation to v2; use Grok/Sonar web search weekly to query "Brazil fires soybean" as cheaper interim approach. [PANEL: Opus review]

44. River Levels (Army Corps / USGS)

What: Mississippi River water levels at key gauges.

Source: USGS real-time water data (https://waterdata.usgs.gov/nwis/rt) -- free. Key gauges: Memphis (primary alert gauge -- set alert for <0 feet), St. Louis, Vicksburg. Simplified approach [PANEL: Opus review]: Monitor Memphis gauge. Alert if below 0 feet or above flood stage. Check weekly during risk seasons (Sep-Nov for low water, Mar-May for flooding). Do not poll daily year-round -- this is a low-frequency signal.

Collection frequency: Weekly check. Event-driven (hourly) when Memphis gauge is within 20% of alert thresholds.


SECTION I: CROSS-COMMODITY CORRELATIONS

45. Core Ratio Monitors (Start with 6, expand to 12 in v2)

v1 (Week 1-2 build): [PANEL: Opus review, Grok 4 Fast, DeepSeek V3.1 all agree -- start with 6]

# Ratio Calculation What It Means Mean-Reversion Signal
1 Gold/Silver GC / SI Risk sentiment. >80 = recession fear, <60 = growth. Z-score > 2 in either direction
2 Oil/NatGas CL / NG Energy substitution. 10:1 = BTU parity. >25:1 = gas too cheap
3 Copper/Gold HG / GC Global growth proxy. Tracks 10Y yields closely. Divergence from 10Y yield > 2 Z-scores
4 Soybean/Corn ZS / ZC Planting decision driver. >2.5 = plant beans, <2.2 = plant corn. Extreme readings in March signal acreage shift
5 Brent/WTI BZ - CL Logistics/geopolitics. Z-score > 2 from 3-year mean
6 Crack Spread (2xRBx42 + HOx42)/3 - CL Refinery margin. Mean ~$20. >$40 or <$10 = unsustainable

v2 additions (add only after core 6 are battle-tested): | 7 | Platinum/Gold | PL / GC | At historic lows. | Ratio < 0.5 = extreme long platinum reversion | | 8 | Crush Spread | ZM + ZL value - ZS cost | Soybean processing margin. | Z-score > 2 | | 9 | Spark Spread | Electricity - (NG x heat rate) | Power plant profit. | Regional | | 10 | Corn/Wheat | ZC / ZW | Feed grain substitution. | Corn >90% wheat = feed switching | | 11 | Gold/Copper x DXY | (GC/HG) x DXY | Combined risk filter. | Divergence from VIX | | 12 | Natgas/Crude BTU | (NG x 5.8) / CL | Energy-equivalent pricing. Note: use 5.8 (precise BTU factor), not 6. [PANEL: Opus review] | >0.5 = gas expensive |

Source for all: Polygon.io preferred. FRED for DXY (DTWEXBGS). Yahoo Finance as emergency fallback only. What to store: Ratio value, 20/60/200-day MAs, Z-score vs 1/3/5-year history, percentile rank. Collection frequency: Daily at market close.

46. Currency-Commodity Links

Currency Pair Commodity Link Why
AUD/USD Copper, Iron Ore Australia is a major copper/iron ore exporter
USD/CAD WTI Crude Canada's economy is oil-dependent
BRL/USD Soybeans, Coffee Brazil is the #1 exporter of both
NOK/USD Brent Crude Norway's sovereign wealth fund is oil-funded
DXY (broad dollar) Commodity basket Inverse correlation: strong dollar = cheap commodities

Source: FRED API for exchange rates (DEXUSAL, DEXCAUS, DEXBZUS, DEXNOUS, DTWEXBGS). What to calculate: 60-day rolling correlation between each pair. Flag when correlation breaks. Collection frequency: Daily.

47. Supply Chain Links

Chains to monitor:

Collection frequency: Weekly assessment using already-collected data.


SECTION J: CHINA DATA

48. NBS Manufacturing PMI

Source: National Bureau of Statistics (http://www.stats.gov.cn/english/). Also Trading Economics/Investing.com for parsed data. What to track:

Collection frequency: Monthly, last day of month.

49. GACC Customs Data (Chinese Imports/Exports)

Source: GACC (http://english.customs.gov.cn/) -- free, English. Also Trading Economics as backup. What to track:

Collection frequency: Monthly (~15th of following month).

50. SHFE Warehouse Stocks

Source: Investing.com or TradingView (more reliable than SHFE English website). [PANEL: Opus review] Collection frequency: Weekly (published each Friday).

51. Korean Exports (Manufacturing / Chip Cycle Proxy)

What: South Korea publishes export data on the 1st of each month with a 20-day flash estimate. Source: Korea Customs Service (https://unipass.customs.go.kr/ets/index_eng.do) -- free, English. What to track:

Defer to v2 [PANEL: Opus review]: Korean chip exports lead copper/palladium demand through a long causal chain. The direct copper demand signals (SHFE stocks, LME stocks, China PMI, SHFE-LME arb) are stronger. Build Korean tracking only if copper signals prove insufficient.


SECTION K: MACRO/ECONOMIC CALENDAR

52. Economic Release Calendar and State Table

What: One row per trading day with every relevant flag and event marker.

SQLite CREATE statement:

CREATE TABLE IF NOT EXISTS daily_state (
  date                      TEXT PRIMARY KEY,  -- YYYY-MM-DD
  -- Energy calendar
  is_eia_petroleum_day       INTEGER DEFAULT 0,  -- Wednesday
  is_eia_natgas_day          INTEGER DEFAULT 0,  -- Thursday
  is_opec_report_day         INTEGER DEFAULT 0,
  is_baker_hughes_day        INTEGER DEFAULT 0,  -- Friday 1PM
  -- Agriculture calendar
  is_wasde_day               INTEGER DEFAULT 0,
  is_crop_progress_day       INTEGER DEFAULT 0,  -- Monday, growing season
  is_export_sales_day        INTEGER DEFAULT 0,  -- Thursday
  is_prospective_plantings   INTEGER DEFAULT 0,  -- end of March
  is_cattle_on_feed_day      INTEGER DEFAULT 0,
  -- CFTC
  is_cot_release_day         INTEGER DEFAULT 0,  -- Friday 3:30 PM
  -- Fed/Treasury
  is_fomc_day                INTEGER DEFAULT 0,
  is_fomc_eve                INTEGER DEFAULT 0,
  days_to_next_fomc          INTEGER,
  is_treasury_auction        INTEGER DEFAULT 0,
  auction_tenor              TEXT,               -- '10Y', '30Y', etc.
  -- Macro releases
  is_cpi_day                 INTEGER DEFAULT 0,
  is_ppi_day                 INTEGER DEFAULT 0,
  is_nfp_day                 INTEGER DEFAULT 0,  -- first Friday of month
  is_gdp_day                 INTEGER DEFAULT 0,
  is_ism_pmi_day             INTEGER DEFAULT 0,  -- 1st business day of month
  is_pce_day                 INTEGER DEFAULT 0,
  is_jobless_claims_day      INTEGER DEFAULT 0,  -- Thursday
  is_adp_day                 INTEGER DEFAULT 0,  -- Wednesday before NFP
  econ_release_tier          INTEGER,            -- 1=CPI/NFP/FOMC, 2=PPI/GDP/ISM, 3=minor
  -- Contract rolls
  contracts_rolling          TEXT,               -- JSON list of symbols in roll window
  is_goldman_roll            INTEGER DEFAULT 0,  -- 5-9 business days before month end
  days_to_contract_expiry    INTEGER,            -- front-month futures days to expiry
  -- Options/calendar events [PANEL: Opus review]
  is_options_expiry          INTEGER DEFAULT 0,  -- triple/quad witching
  is_end_of_quarter_rebalance INTEGER DEFAULT 0,
  is_us_holiday              INTEGER DEFAULT 0,
  is_uk_holiday              INTEGER DEFAULT 0,  -- LME closed
  is_china_holiday           INTEGER DEFAULT 0,  -- SHFE closed
  vix_regime                 TEXT,               -- 'low' (<20), 'elevated' (20-30), 'high' (>30)
  -- Weather
  hurricane_active           INTEGER DEFAULT 0,
  extreme_weather_flag       TEXT,               -- 'polar_vortex', 'heat_wave', etc.
  -- China
  is_china_pmi_day           INTEGER DEFAULT 0,
  is_china_customs_day       INTEGER DEFAULT 0,
  -- General
  is_month_end               INTEGER DEFAULT 0,
  is_quarter_end             INTEGER DEFAULT 0,
  is_half_day                INTEGER DEFAULT 0,
  created_at                 TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_daily_state_date ON daily_state(date);

Collection frequency: Generated daily at 5:00 AM ET. Calendar data sourced from CME, USDA, EIA, BLS, Treasury release schedules.

53. Economic Release Tier Ranking

Tier 1 (move futures 1%+ regularly):

Tier 2 (move futures 0.3-1%):

Tier 3 (minor, context only):

54. Consensus Estimates and Surprise Calculator [PANEL: Opus review -- MUST-HAVE, was missing from original spec]

What: For every Tier 1 and Tier 2 release, store the median analyst forecast before release and calculate surprise = actual - consensus. Prediction markets price RELATIVE TO CONSENSUS, not the raw number. This is the single most important missing piece in the original spec.

Source:

What to store:

SQL:

CREATE TABLE IF NOT EXISTS economic_surprises (
  id                    INTEGER PRIMARY KEY AUTOINCREMENT,
  release_date          TEXT NOT NULL,          -- YYYY-MM-DD
  release_time          TEXT,                   -- HH:MM ET
  indicator_name        TEXT NOT NULL,          -- 'CPI', 'NFP', 'EIA_crude', etc.
  tier                  INTEGER NOT NULL,       -- 1, 2, or 3
  consensus_estimate    REAL,
  actual_value          REAL,
  prior_value           REAL,
  surprise              REAL,                   -- actual - consensus
  surprise_sigma        REAL,                   -- surprise / historical_std
  is_upward_revision    INTEGER,                -- for WASDE/STEO revisions
  source                TEXT,                   -- 'investing_com', 'tradingeconomics', etc.
  created_at            TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_surprises_date ON economic_surprises(release_date);
CREATE INDEX IF NOT EXISTS idx_surprises_indicator ON economic_surprises(indicator_name);

Collection frequency: Pull consensus pre-release (morning of release day). Pull actual at time of release. Calculate surprise immediately.

55. Fed Dot Plot and Summary of Economic Projections (SEP) [PANEL: Opus review -- MUST-HAVE]

What: Released quarterly at FOMC meetings (March, June, September, December). The dot plot shows each Fed governor's rate forecast. The median dot IS the Fed's forward guidance.

Source: Federal Reserve website (federalreserve.gov/monetarypolicy/fomccalendars.htm). FRED: FEDTARMD for median target rate projections. Parse from PDF tables. What to track:

Collection frequency: Quarterly (4 times/year at FOMC meetings).

56. Weekly Economic Indicators [PANEL: Opus review]

Initial and Continuing Jobless Claims (MUST-HAVE):

Atlanta Fed GDPNow (MUST-HAVE):

Cleveland Fed Inflation Nowcast (MUST-HAVE):

Collection frequency: Jobless claims weekly (Thursday). GDPNow and CPI Nowcast daily when updated.


SECTION L: MACRO LEADING INDICATORS

57. VIX and MOVE Index (Volatility Regime) [PANEL: Opus review -- MUST-HAVE]

What: VIX (equity volatility) and MOVE (bond volatility index). VIX >30 changes the correlation structure of ALL commodities (correlations go to 1 in a crisis). MOVE >150 makes Treasury auction results much more volatile.

Source:

Collection frequency: Daily.

58. Implied Volatility and Skew for Commodities [PANEL: Gemini 3.1 Pro -- MUST-HAVE]

What: Prediction markets are essentially binary options. Pricing Kalshi "WTI > $80" accurately requires knowing the market's implied volatility (OVX) and skew. Futures prices alone only give you the at-the-money expectation.

Source:

Collection frequency: Daily.

59. Credit Spreads as Leading Indicators [PANEL: Sonar Reasoning Pro]

What: HY-IG (high yield minus investment grade) spreads signal risk appetite before commodities move. Wide spreads = demand destruction coming = commodity bearish.

Source: FRED:

Collection frequency: Daily.

60. CPI/PPI Sub-Component History for Nowcasting [PANEL: Opus review -- MUST-HAVE]

What: Shelter CPI is 36% of CPI index and lags real-time rent data by 12-18 months. Tracking leading indicators of shelter allows 6+ month CPI forecasts. Supercore (core services ex-housing) is the Fed's current focus.

Source:

Collection frequency: Monthly at BLS releases. Weekly for ISM sub-components.


SECTION M: OBSCURE / ALTERNATIVE DATA

61. Sentinel-5P Satellite NO2 for Refinery Outages

What: Nitrogen dioxide concentrations over refinery complexes. Drop in NO2 = unplanned outage, 12-48 hours before news.

Source: Copernicus Open Access Hub or Google Earth Engine -- free, requires registration. Refinery complexes: Houston Ship Channel, Rotterdam/Antwerp, Jamnagar, Ras Tanura, Singapore. Collection frequency: Daily satellite pass. Defer to v2 [PANEL: Opus review, Gemini 3.1 Pro, Grok 4.1 -- all agree]: Requires multi-GB NetCDF/HDF5 downloads, geospatial processing, cloud-cover corrections (~40% of passes blocked). Interim approach: EPA CAMD continuous emissions monitoring (free, near-real-time for US facilities) + industry newsletter monitoring for refinery alerts.

62. GridStatus.io Electricity Prices

Source: GridStatus.io -- free tier with real-time locational marginal prices (LMPs). What to track: Real-time LMP by region (ERCOT, PJM, MISO, SPP, CAISO, NYISO). Day-ahead vs real-time price spread. Collection frequency: Hourly during business hours.

63. Ship AIS Tracking (Tanker/Bulk Carrier Flows)

Source: MarineTraffic, VesselFinder, AISHub -- free tiers. Warning [PANEL: Gemini 3.1 Pro]: MarineTraffic and VesselFinder use aggressive Cloudflare/Datadome bot protection. Scraping from a VPS IP is unreliable. Practical approach [PANEL: Opus review]: Use Grok/Sonar web search weekly to query "floating storage VLCC count" and "China crude oil tanker arrivals." Industry analysts publish these counts for free on Twitter/LinkedIn. Build automated AIS pipeline only if manual monitoring proves consistently valuable over 3+ months. Collection frequency: Weekly manual scan. Automated pipeline deferred to v2.

64. EPA RIN Credits

Source: EPA EMTS -- free but significant lag. Defer to v2 [PANEL: Opus review]: Small marginal signal relative to WASDE surprises; data has significant collection lag.

65. AAR Weekly Rail Carloads

Source: AAR (https://www.aar.org/data-research/rail-traffic-data/) -- free weekly summary. What to track: Total carloads by commodity group, YoY change, 4-week MA, grain carloads. Collection frequency: Weekly (released Wednesdays).

66. Baltic Dry Index (Decomposed)

Source: Free delayed data via Yahoo Finance (^BDI); Hellenic Shipping News for decomposition. What to track: BDI composite, Capesize sub-index (iron ore/coal), Panamax sub-index (grain). Collection frequency: Daily.

67. Freightos Baltic Index (FBX) [PANEL: DeepSeek V3.1]

What: Container shipping rates. Leading indicator of consumer goods inflation (affects Fed policy expectations). Source: Freightos.com -- free delayed data. Collection frequency: Weekly.

68. NDVI Crop Health Monitoring [PANEL: Grok 4 Fast]

What: Normalized Difference Vegetation Index from Landsat/Sentinel satellites provides real-time crop health at county level, detecting yield issues 4-6 weeks before USDA reports. Source: NASA Earthdata API (free) -- https://earthdata.nasa.gov/. USGS CropScape -- https://www.nass.usda.gov/Research_and_Science/Cropland/data_products.php. Defer to v2: Requires significant geospatial processing. Interim: use USDA crop progress good/excellent ratings as proxy.


SECTION N: CONVICTION SCORING SYSTEM

69. Market Regime Detection [PANEL: Sonar Reasoning Pro -- MUST-ADD before conviction scoring]

What: The same signal means different things in different market regimes. Classify regime daily before scoring any trade.

enum MarketRegime {
  RISK_ON    = 'equities up, VIX < 15, credit spreads tight, correlations low',
  RISK_OFF   = 'equities down, VIX > 20, credit spreads wide, correlations high',
  SUPPLY_SHOCK = 'backwardation, commercial hedgers extreme, inventory draws',
  DEMAND_SHOCK = 'contango, managed money extreme long, inventory builds',
  NORMAL     = 'everything else'
}

// Regime detection from:
// 1. VIX level + 5-day trend
// 2. HY-IG spread (Section L.59)
// 3. 2s10s curve shape
// 4. Cross-commodity correlation (if all moving together = risk-off)
// 5. Futures curve structure (backwardation vs contango)

// Regime-specific conviction weight adjustments:
const convictionAdjustment: Record<MarketRegime, Record<string, number>> = {
  [MarketRegime.RISK_ON]:      { fundamentals: 0.30, momentum: 0.35, positioning: 0.15, curve: 0.15, seasonal: 0.05 },
  [MarketRegime.RISK_OFF]:     { fundamentals: 0.20, momentum: 0.50, positioning: 0.10, curve: 0.10, seasonal: 0.10 },
  [MarketRegime.SUPPLY_SHOCK]: { fundamentals: 0.50, positioning: 0.25, momentum: 0.10, curve: 0.10, seasonal: 0.05 },
  [MarketRegime.NORMAL]:       { fundamentals: 0.25, momentum: 0.20, positioning: 0.20, curve: 0.15, seasonal: 0.10 }
};

Store daily regime classification in daily_state table (add market_regime TEXT column).

70. Composite Conviction Score (0-100)

What: A systematic scoring framework to separate real signals from noise and traps. Every trade idea gets scored before analysts see it.

Components (apply regime-adjusted weights):

# Component Base Weight Description
1 Fundamental Direction 25 Do government data sources (EIA, USDA, CFTC) agree? Unanimous (25), majority (15), mixed (5), contradictory (0).
2 Positioning Confirmation 20 See COT scoring table in Section 34. Dealer flip + commercial (20), dealer flip alone (15), commercial flip alone (12), managed money extreme + dealer confirmation (18), managed money extreme alone (5), all categories crowded (0).
3 Term Structure Signal 15 Backwardation supports bullish (15), neutral (8), curve contradicts (0).
4 Cross-Commodity Confirmation 15 3+ correlated assets confirming (15), 1-2 (8), contradicting (0).
5 Seasonal Alignment 10 Strong seasonal in correct regime with >65% hit rate (10), moderate (5), fighting seasonal (0).
6 Freshness / Timeliness 15 Data <24h old (15), <1 week (10), >1 week (5), >2 weeks (0). COT data always 3 days stale -- COT freshness = max(10, 15 - days_since_release). [PANEL: Grok 4 Fast]

Initial weights are estimates only [PANEL: Opus review, Sonar Reasoning Pro]: Start with equal weights (16.7% each) if preferred; the weights above are reasonable initial guesses. After 3 months of logged outcomes, run logistic regression to determine empirical weights. Label weights as 'v1_estimate' in config. Target: calibrate after 100 settled predictions.

Backtesting requirement [PANEL: Sonar Reasoning Pro]: Before deploying scoring, run a 3-year backtest on past commodity moves:

// Track outcome by conviction score decile
// Target: decile 9 (80-90 conviction) should be >60% accurate
// If any decile shows <50% accuracy = investigate immediately
// Weekly Brier score calibration after launch

Total possible: 100. Minimum threshold: 55 to pass to analysts. Liquidity-adjusted threshold: For prediction markets, raise minimum threshold to 70 due to execution costs. For liquid futures (CL, GC, ZN), 55 is appropriate.

71. Data Quality Validation Layer [PANEL: Sonar Reasoning Pro -- add before conviction scoring]

interface ValidationResult {
  source: string;
  dataQuality: 'validated' | 'revised' | 'preliminary' | 'delayed' | 'failed';
  revisionRisk: 'low' | 'medium' | 'high';
  confidenceMultiplier: 0.5 | 0.7 | 1.0;
  lastSuccessfulFetch: Date;
  lagVsExpected: number;  // days behind expected release
}

// Validation rules:
// EIA: if data age > 8 days, quality = 'delayed', confidence *= 0.7
// EIA: if marked 'preliminary', revisionRisk = 'high', confidence *= 0.8
// CFTC: verify COT release happened (don't assume Friday 3:30PM always occurs)
// WASDE backward revision: flag as noise, do not fire new signal
// Satellite: reject if cloud cover > 30%
// Any source: alert if > 48 hours stale vs expected release schedule

Fallback sources table [PANEL: Sonar Reasoning Pro]:

Primary Fallback 1 Fallback 2
Kalshi API Parse Kalshi web UI Polymarket only
Polymarket CLOB Polymarket web UI Manual hourly
FRED API FRED CSV download Yahoo Finance (yields)
Polygon (futures) Alpha Vantage free tier Yahoo Finance (emergency)
EIA API EIA CSV download Parse EIA website
USDA WASDE API USDA PDF parser News headlines + consensus
Investing.com calendar TradingEconomics MarketWatch calendar

Operational SLA:

72. Mandatory Rejection Rules (Override Score)

These rules reject a trade idea regardless of conviction score:

  1. Circular Source Chain: If the bullish case for commodity X relies on data derived from commodity X's price, reject. Example: "gold should go up because gold ETF inflows are rising" -- the inflows ARE the price move.

  2. Staleness Window Exceeded: Primary data source has not updated in >2x its normal frequency. Stale data = trading yesterday's signal.

  3. Regime Change Override: If the 60-day rolling correlation between the signal source and the commodity has flipped sign (e.g., was +0.7, now -0.3), reject until new regime is understood.

  4. Narrative vs Data Mismatch: Trade thesis relies on qualitative narrative but quantitative data contradicts it. Data wins.

  5. Single-Source Dependency: Entire trade thesis rests on one data point with no corroboration. Require at least 2 independent data sources.

  6. Data Quality Failure: Primary source has confidenceMultiplier < 0.7 AND no fallback source available.

73. Trap Detection Flags

These don't auto-reject but add a warning flag visible to analysts:


SECTION O: WEEKLY CALIBRATION AND FEEDBACK LOOP [PANEL: Sonar Reasoning Pro]

74. Brier Score Feedback System

Every Friday after market close, calculate calibration metrics:

// Group predictions by conviction score decile (0-10, 10-20, ..., 90-100)
// For each decile where outcome is known:
//   - Calculate Brier score
//   - Calculate win rate
//   - Track vs historical

// Alert conditions:
// - Any decile shows <50% win rate: investigate immediately
// - High conviction (80+) shows <60% win rate: weights are wrong
// - Any single data source correlates with unusually low accuracy: signal may be dead

// Output: weekly calibration report to EDGE TEAM group

Prediction tracking table:

CREATE TABLE IF NOT EXISTS prediction_log (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  prediction_date   TEXT NOT NULL,
  commodity         TEXT NOT NULL,
  direction         TEXT NOT NULL,        -- 'bullish', 'bearish'
  venue             TEXT NOT NULL,        -- 'futures', 'kalshi', 'polymarket'
  conviction_score  REAL NOT NULL,
  regime            TEXT,
  primary_signals   TEXT,                 -- JSON array of signal names
  entry_price       REAL,
  target_price      REAL,
  settlement_date   TEXT,
  settled_price     REAL,
  outcome           TEXT,                 -- 'win', 'loss', 'open'
  brier_score       REAL,
  return_pct        REAL,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pred_date ON prediction_log(prediction_date);
CREATE INDEX IF NOT EXISTS idx_pred_outcome ON prediction_log(outcome);

SQLite CREATE STATEMENTS (Complete Schema)

-- Enable WAL mode for concurrent write performance
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;

-- ============================================================
-- ENERGY DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS eia_petroleum_weekly (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  report_date       TEXT NOT NULL,       -- YYYY-MM-DD (Wednesday)
  padd              TEXT NOT NULL,       -- 'US', 'PADD1'..'PADD5', 'SPR', 'Cushing'
  crude_stocks_kb   REAL,               -- thousand barrels
  gasoline_stocks_kb REAL,
  distillate_stocks_kb REAL,
  crude_imports_kbd REAL,               -- thousand barrels/day
  crude_production_kbd REAL,
  refinery_util_pct REAL,
  refinery_inputs_kbd REAL,
  gasoline_supplied_kbd REAL,           -- demand proxy
  vs_5yr_avg_pct    REAL,               -- current vs 5-year seasonal average
  wow_change_kb     REAL,               -- week-over-week change
  consensus_crude   REAL,               -- pre-release consensus
  surprise_crude    REAL,               -- actual - consensus
  data_type         TEXT DEFAULT 'final', -- 'advance', 'final'
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_eia_pet_date_padd ON eia_petroleum_weekly(report_date, padd);

CREATE TABLE IF NOT EXISTS eia_natgas_storage (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  report_date       TEXT NOT NULL,
  region            TEXT NOT NULL,      -- 'Total', 'East', 'Midwest', 'Mountain', 'Pacific', 'SouthCentral'
  storage_bcf       REAL,
  net_change_bcf    REAL,
  vs_5yr_avg_pct    REAL,
  vs_prior_yr_pct   REAL,
  consensus_bcf     REAL,
  surprise_bcf      REAL,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_eia_gas_date_region ON eia_natgas_storage(report_date, region);

-- ============================================================
-- FUTURES PRICES
-- ============================================================
CREATE TABLE IF NOT EXISTS futures_prices (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  trade_date        TEXT NOT NULL,      -- YYYY-MM-DD
  symbol            TEXT NOT NULL,      -- root symbol: CL, GC, ZC, etc.
  contract_month    TEXT NOT NULL,      -- YYYY-MM (expiry month)
  expiry_date       TEXT,
  months_to_expiry  INTEGER,
  open              REAL,
  high              REAL,
  low               REAL,
  settle            REAL NOT NULL,
  volume            INTEGER,
  open_interest     INTEGER,
  oi_change         INTEGER,            -- daily OI change
  price_change      REAL,              -- daily settle change
  oi_price_signal   TEXT,             -- 'new_longs','short_cover','new_shorts','long_liq'
  spread_vs_front   REAL,             -- price difference vs front month
  source            TEXT DEFAULT 'polygon',
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_futures_date_sym_month ON futures_prices(trade_date, symbol, contract_month);
CREATE INDEX IF NOT EXISTS idx_futures_trade_date ON futures_prices(trade_date);
CREATE INDEX IF NOT EXISTS idx_futures_symbol ON futures_prices(symbol);

-- ============================================================
-- TERM STRUCTURE ANALYTICS
-- ============================================================
CREATE TABLE IF NOT EXISTS term_structure (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  calc_date         TEXT NOT NULL,
  symbol            TEXT NOT NULL,
  structure_type    TEXT NOT NULL,     -- 'contango', 'backwardation', 'mixed'
  front_price       REAL,
  sixth_month_price REAL,
  twelfth_month_price REAL,
  front_back_spread REAL,             -- front minus 6th month
  annualized_roll_yield REAL,         -- %
  curve_slope_percentile REAL,        -- vs 1-year history
  curve_slope_pct_3yr REAL,           -- vs 3-year history
  structure_change  TEXT,             -- 'flattening', 'steepening', 'stable'
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_term_struct_date_sym ON term_structure(calc_date, symbol);

-- ============================================================
-- CFTC COT DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS cot_positions (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  report_date       TEXT NOT NULL,     -- report as-of date (Tuesday)
  release_date      TEXT,              -- actual release date (Friday)
  symbol            TEXT NOT NULL,     -- CL, GC, ZC, etc.
  report_type       TEXT NOT NULL,     -- 'disaggregated', 'tff', 'legacy', 'cit'
  category          TEXT NOT NULL,     -- 'prod_merch','swap_dealer','managed_money','other','commercial','noncommercial','index_trader'
  long_contracts    INTEGER,
  short_contracts   INTEGER,
  spreading_contracts INTEGER,
  net_position      INTEGER,           -- long - short
  long_pct_oi       REAL,
  short_pct_oi      REAL,
  wow_net_change    INTEGER,
  net_z52           REAL,              -- z-score vs 52-week rolling
  net_z3yr          REAL,              -- z-score vs 3-year rolling
  net_percentile    REAL,              -- percentile rank vs 3-year
  consecutive_direction_weeks INTEGER, -- weeks in same direction
  is_extreme        INTEGER DEFAULT 0, -- net_z52 > 2 or < -2
  dealer_flip       INTEGER DEFAULT 0, -- swap dealer net changed sign this week
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_cot_date_sym_type_cat ON cot_positions(report_date, symbol, report_type, category);
CREATE INDEX IF NOT EXISTS idx_cot_dealer_flip ON cot_positions(dealer_flip, report_date);

-- ============================================================
-- PREDICTION MARKET SNAPSHOTS
-- ============================================================
CREATE TABLE IF NOT EXISTS prediction_market_snapshots (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  snapshot_time     TEXT NOT NULL,     -- ISO8601
  platform          TEXT NOT NULL,     -- 'kalshi', 'polymarket'
  market_id         TEXT NOT NULL,
  title             TEXT,
  category          TEXT,              -- 'commodity', 'macro', 'rates', 'crypto'
  expiry_date       TEXT,
  settlement_source TEXT,              -- what data Kalshi uses to resolve
  yes_bid           REAL,
  yes_ask           REAL,
  yes_last          REAL,
  no_bid            REAL,
  no_ask            REAL,
  volume            REAL,
  open_interest     REAL,
  book_depth_json   TEXT,              -- JSON: top 2 levels for all, top 5 for high-volume
  is_high_volume    INTEGER DEFAULT 0, -- volume > 100 contracts/day
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pm_time ON prediction_market_snapshots(snapshot_time);
CREATE INDEX IF NOT EXISTS idx_pm_market ON prediction_market_snapshots(market_id, platform);
-- Store OHLCV daily summaries to manage storage
CREATE TABLE IF NOT EXISTS prediction_market_daily (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  trade_date        TEXT NOT NULL,
  platform          TEXT NOT NULL,
  market_id         TEXT NOT NULL,
  yes_open          REAL,
  yes_high          REAL,
  yes_low           REAL,
  yes_close         REAL,
  volume            REAL,
  eod_book_json     TEXT,             -- end-of-day order book snapshot only
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_pm_daily_date_mkt ON prediction_market_daily(trade_date, market_id, platform);

-- ============================================================
-- MACRO / FRED DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS macro_series (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  obs_date          TEXT NOT NULL,     -- YYYY-MM-DD
  series_id         TEXT NOT NULL,     -- FRED series ID or custom
  series_name       TEXT,
  value             REAL,
  wow_change        REAL,              -- week-over-week change
  mom_change        REAL,              -- month-over-month change
  z52               REAL,              -- z-score vs 52-week
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_macro_date_series ON macro_series(obs_date, series_id);
-- Key series: DGS2, DGS10, DGS30, T10YIE, T5YIFR, DFII10, SOFR, RRPONTSYD, WALCL,
--            WTREGEN, WLRRAFORL, VIXCLS, BAMLH0A0HYM2, BAMLC0A0CM, DTWEXBGS,
--            DEXUSAL, DEXCAUS, DEXBZUS, ICSA, CCSA

-- ============================================================
-- CROSS-COMMODITY RATIOS
-- ============================================================
CREATE TABLE IF NOT EXISTS commodity_ratios (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  calc_date         TEXT NOT NULL,
  ratio_name        TEXT NOT NULL,     -- 'gold_silver', 'oil_natgas', 'copper_gold', etc.
  raw_value         REAL NOT NULL,
  ma20              REAL,
  ma60              REAL,
  ma200             REAL,
  z52               REAL,              -- z-score vs 52-week
  z3yr              REAL,              -- z-score vs 3-year
  percentile_1yr    REAL,
  percentile_3yr    REAL,
  percentile_5yr    REAL,
  is_extreme        INTEGER DEFAULT 0, -- percentile > 90 or < 10
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_ratios_date_name ON commodity_ratios(calc_date, ratio_name);

-- ============================================================
-- USDA DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS wasde_estimates (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  release_date      TEXT NOT NULL,
  commodity         TEXT NOT NULL,     -- 'corn', 'soybeans', 'wheat', etc.
  region            TEXT NOT NULL,     -- 'US', 'World'
  production_mbu    REAL,              -- million bushels (or metric tons for world)
  ending_stocks_mbu REAL,
  stocks_to_use_pct REAL,
  exports_mbu       REAL,
  yield_bu_acre     REAL,
  area_harvested_mA REAL,              -- million acres
  mom_revision      REAL,              -- current minus prior month ending stocks
  consensus_stocks  REAL,             -- pre-release trade estimate
  surprise          REAL,             -- actual - consensus
  revision_type     TEXT,             -- 'upward', 'downward', 'backward', 'no_change'
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_wasde_date_com_region ON wasde_estimates(release_date, commodity, region);

-- ============================================================
-- TREASURY AUCTIONS
-- ============================================================
CREATE TABLE IF NOT EXISTS treasury_auctions (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  auction_date      TEXT NOT NULL,
  security_type     TEXT NOT NULL,     -- '2Y', '5Y', '10Y', '30Y', 'TIPS'
  high_yield        REAL,              -- stop-out rate %
  bid_to_cover      REAL,
  tail_bps          REAL,              -- high yield minus when-issued, basis points
  indirect_pct      REAL,             -- % of award to indirect bidders
  direct_pct        REAL,
  dealer_pct        REAL,
  btc_z1yr          REAL,             -- bid-to-cover z-score vs 1-year same tenor
  tail_z1yr         REAL,             -- tail z-score vs 1-year same tenor
  indirect_z1yr     REAL,
  is_weak           INTEGER DEFAULT 0, -- tail > 1bps AND btc < 1yr avg AND indirect < 1yr avg
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_auctions_date_type ON treasury_auctions(auction_date, security_type);

-- ============================================================
-- METALS WAREHOUSE STOCKS
-- ============================================================
CREATE TABLE IF NOT EXISTS metals_warehouse (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  report_date       TEXT NOT NULL,
  exchange          TEXT NOT NULL,    -- 'COMEX', 'LME', 'SHFE'
  metal             TEXT NOT NULL,    -- 'gold', 'silver', 'copper', 'aluminum', etc.
  total_stocks      REAL,             -- in appropriate unit (troy oz, lots, tonnes)
  registered_stocks REAL,             -- COMEX: available for delivery
  eligible_stocks   REAL,
  on_warrant        REAL,             -- LME: available
  cancelled_warrants REAL,           -- LME: being withdrawn (bullish demand signal)
  cancelled_pct     REAL,            -- cancelled / total * 100
  largest_location_pct REAL,         -- squeeze risk indicator
  wow_change        REAL,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_metals_wh_date_exch_metal ON metals_warehouse(report_date, exchange, metal);

-- ============================================================
-- CONVICTION SCORES
-- ============================================================
CREATE TABLE IF NOT EXISTS conviction_scores (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  scored_at         TEXT NOT NULL,
  commodity         TEXT NOT NULL,
  direction         TEXT NOT NULL,   -- 'bullish', 'bearish'
  venue             TEXT,            -- 'futures', 'kalshi', 'polymarket'
  market_regime     TEXT,
  -- Component scores
  fundamental_score REAL,
  positioning_score REAL,
  curve_score       REAL,
  cross_commodity_score REAL,
  seasonal_score    REAL,
  freshness_score   REAL,
  -- Totals
  total_score       REAL NOT NULL,
  passes_threshold  INTEGER DEFAULT 0,
  -- Flags
  trap_flags_json   TEXT,            -- JSON array of triggered trap flags
  rejection_reason  TEXT,            -- if rejected, why
  data_quality_json TEXT,            -- JSON: ValidationResult for each source
  passed_to_analysts INTEGER DEFAULT 0,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_conviction_scored ON conviction_scores(scored_at);
CREATE INDEX IF NOT EXISTS idx_conviction_passes ON conviction_scores(passes_threshold);

-- ============================================================
-- ECONOMIC SURPRISES
-- ============================================================
CREATE TABLE IF NOT EXISTS economic_surprises (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  release_date      TEXT NOT NULL,
  release_time      TEXT,
  indicator_name    TEXT NOT NULL,
  tier              INTEGER NOT NULL,
  consensus_estimate REAL,
  actual_value      REAL,
  prior_value       REAL,
  surprise          REAL,
  surprise_sigma    REAL,
  is_backward_revision INTEGER DEFAULT 0,
  source            TEXT,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_surprises_date_ind ON economic_surprises(release_date, indicator_name);

-- ============================================================
-- PREDICTION LOG (accountability)
-- ============================================================
CREATE TABLE IF NOT EXISTS prediction_log (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  prediction_date   TEXT NOT NULL,
  commodity         TEXT NOT NULL,
  direction         TEXT NOT NULL,
  venue             TEXT NOT NULL,
  conviction_score  REAL NOT NULL,
  regime            TEXT,
  primary_signals   TEXT,            -- JSON
  entry_price       REAL,
  target_price      REAL,
  settlement_date   TEXT,
  settled_price     REAL,
  outcome           TEXT,            -- 'win', 'loss', 'open', 'hold'
  brier_score       REAL,
  return_pct        REAL,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pred_date ON prediction_log(prediction_date);
CREATE INDEX IF NOT EXISTS idx_pred_outcome ON prediction_log(outcome, settlement_date);

-- ============================================================
-- WEATHER DATA
-- ============================================================
CREATE TABLE IF NOT EXISTS hdd_cdd (
  id                INTEGER PRIMARY KEY AUTOINCREMENT,
  week_ending       TEXT NOT NULL,
  region            TEXT NOT NULL,
  hdd_actual        REAL,
  hdd_normal        REAL,
  hdd_vs_normal     REAL,
  cdd_actual        REAL,
  cdd_normal        REAL,
  cdd_vs_normal     REAL,
  ytd_hdd_vs_normal REAL,
  created_at        TEXT DEFAULT (datetime('now'))
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_hdd_week_region ON hdd_cdd(week_ending, region);

-- ============================================================
-- DAILY STATE TABLE (see Section 52 for full schema)
-- ============================================================
-- Already defined above with all fields including panel additions.

KEY SQL QUERIES

-- Daily signal dashboard: commodities with extreme COT positioning + supporting fundamentals
SELECT
  c.symbol,
  c.category,
  c.net_z52,
  c.net_percentile,
  c.dealer_flip,
  c.consecutive_direction_weeks,
  t.structure_type,
  t.annualized_roll_yield,
  r.raw_value AS ratio_value,
  r.z3yr AS ratio_z3yr
FROM cot_positions c
LEFT JOIN term_structure t ON t.symbol = c.symbol AND t.calc_date = (SELECT MAX(calc_date) FROM term_structure WHERE symbol = c.symbol)
LEFT JOIN commodity_ratios r ON r.calc_date = (SELECT MAX(calc_date) FROM commodity_ratios)
WHERE c.report_date = (SELECT MAX(report_date) FROM cot_positions)
  AND c.category = 'swap_dealer'
  AND (c.dealer_flip = 1 OR ABS(c.net_z52) > 2)
ORDER BY ABS(c.net_z52) DESC;

-- Prediction market mispricing vs Cleveland Fed CPI nowcast
SELECT
  p.market_id,
  p.title,
  p.yes_last AS kalshi_yes_price,
  m.value AS cleveland_fed_nowcast,
  (p.yes_last - m.value) AS implied_edge
FROM prediction_market_snapshots p
CROSS JOIN macro_series m
WHERE p.platform = 'kalshi'
  AND p.category = 'macro'
  AND p.title LIKE '%CPI%'
  AND m.series_id = 'ClevelandFedCPINowcast'
  AND p.snapshot_time > datetime('now', '-1 hour')
  AND m.obs_date = (SELECT MAX(obs_date) FROM macro_series WHERE series_id = 'ClevelandFedCPINowcast')
ORDER BY ABS(implied_edge) DESC;

-- Weekly Brier score calibration by conviction decile
SELECT
  CAST(conviction_score / 10 AS INTEGER) * 10 AS decile_min,
  COUNT(*) AS predictions,
  AVG(brier_score) AS avg_brier,
  SUM(CASE WHEN outcome = 'win' THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS win_rate
FROM prediction_log
WHERE outcome IN ('win', 'loss')
  AND prediction_date > date('now', '-90 days')
GROUP BY 1
ORDER BY 1;

-- EIA report surprise history vs next-day futures price move (validate surprise signal)
SELECT
  s.release_date,
  s.indicator_name,
  s.surprise,
  s.surprise_sigma,
  f.price_change AS next_day_cl_move
FROM economic_surprises s
LEFT JOIN futures_prices f ON f.trade_date = date(s.release_date, '+1 day')
  AND f.symbol = 'CL' AND f.contract_month = (
    SELECT MIN(contract_month) FROM futures_prices WHERE symbol = 'CL' AND trade_date = date(s.release_date, '+1 day'))
WHERE s.indicator_name = 'EIA_crude_inventories'
ORDER BY s.release_date DESC
LIMIT 52;

-- COT dealer flip tracker with subsequent returns
SELECT
  c.report_date,
  c.symbol,
  c.net_position,
  c.net_z52,
  f4.settle AS price_4wk_later,
  f0.settle AS price_at_flip,
  ((f4.settle - f0.settle) / f0.settle * 100) AS return_4wk_pct
FROM cot_positions c
LEFT JOIN futures_prices f0 ON f0.symbol = c.symbol
  AND f0.trade_date = c.report_date
  AND f0.months_to_expiry = 1
LEFT JOIN futures_prices f4 ON f4.symbol = c.symbol
  AND f4.trade_date = date(c.report_date, '+28 days')
  AND f4.months_to_expiry = 1
WHERE c.category = 'swap_dealer'
  AND c.dealer_flip = 1
ORDER BY c.report_date DESC;

-- Cross-venue Kalshi vs Polymarket divergence alerts
SELECT
  k.market_id AS kalshi_id,
  p.market_id AS polymarket_id,
  k.title,
  k.yes_last AS kalshi_yes,
  p.yes_last AS polymarket_yes,
  (p.yes_last - k.yes_last) AS spread
FROM prediction_market_snapshots k
JOIN prediction_market_snapshots p ON p.title LIKE '%' || SUBSTR(k.title, 1, 20) || '%'
  AND p.platform = 'polymarket'
  AND p.snapshot_time > datetime('now', '-2 hours')
WHERE k.platform = 'kalshi'
  AND k.snapshot_time > datetime('now', '-2 hours')
  AND ABS(p.yes_last - k.yes_last) > 0.10
ORDER BY ABS(p.yes_last - k.yes_last) DESC;

API ENDPOINTS SUMMARY

Data Source Endpoint Auth Rate Limit
EIA petroleum EIA API GET /v2/petroleum/stoc/wstk/data/ Free API key 1000/hr
EIA natgas EIA API GET /v2/natural-gas/stor/wkly/data/ Free API key 1000/hr
FRED series FRED GET /fred/series/observations?series_id={ID} Free API key 120/min
Treasury auctions Fiscal Data GET /v1/accounting/od/auctions_query None Generous
CFTC COT CFTC.gov Download: /dea/newcot/f_disagg.txt None None
CFTC CIT CFTC.gov Download: /dea/newcot/deacit.txt None None
USDA PSD USDA FAS GET /PSDOnlineV2/api/ None (~100/hr) Low -- cache monthly
USDA QuickStats USDA NASS GET /quickstats/api/api_GET/ Free API key Moderate
USDA exports USDA FAS https://apps.fas.usda.gov/export-sales/ None None
Kalshi markets Kalshi GET /trade-api/v2/markets RSA-PSS signed Undocumented -- use backoff
Polymarket Polymarket CLOB GET https://clob.polymarket.com/markets None (read) Unknown -- use backoff
Futures prices Polygon.io GET /v2/aggs/ticker/{ticker}/range/{mult}/{span}/{from}/{to} API key ($29/mo) Tiered
NY Fed SOMA NY Fed GET /api/soma/summary.json None Generous
NY Fed RRP NY Fed GET /api/rp/reverserepo/search.json None Generous
CME FedWatch CME Scrape: cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html None Scraping
NOAA CPC NOAA https://www.cpc.ncep.noaa.gov/products/predictions/ None None
Drought Monitor NDMC https://droughtmonitor.unl.edu/DmData/DataDownload.aspx None None
NHC Hurricane NOAA GET https://api.weather.gov/alerts None None
GACC Customs China Customs http://english.customs.gov.cn/ None Scraping
NBS PMI China NBS http://www.stats.gov.cn/english/ None Scraping
SHFE stocks Investing.com/TV Scraping None Scraping
BLS CPI subs BLS API POST /publicAPI/v2/timeseries/data/ Free API key 500/day
GDPNow Atlanta Fed https://www.atlantafed.org/cqer/research/gdpnow None Scraping
CPI Nowcast Cleveland Fed https://www.clevelandfed.org/indicators-and-data/inflation-nowcasting None Scraping
Jobless claims FRED series ICSA, CCSA Free API key 120/min
ISM sub-indices FRED NAPMPI, NAPMNOI, NAPMEI Free API key 120/min
OVX (oil vol) FRED or CBOE CBOE OVX, GVZ None None
Credit spreads FRED BAMLH0A0HYM2, BAMLC0A0CM Free API key 120/min
ICE Brent COT ICE https://www.theice.com/marketdata/reports/82 None Weekly PDF
GridStatus GridStatus.io REST API Free tier 100/day free
AAR Rail AAR https://www.aar.org/data-research/rail-traffic-data/ None Scraping
BDI Yahoo Finance ^BDI None Scraping
FBX Freight Freightos https://fbx.freightos.com/api/ Free delayed None

COLLECTION CADENCE (CRON SCHEDULE)

# Daily -- 5:00 AM ET: Build daily_state table for today
0 5 * * 1-5   node collectors/daily-state-builder.js

# Daily -- 6:00 PM ET: After futures market close -- prices, ratios, Z-scores
0 18 * * 1-5  node collectors/futures-daily.js
0 18 * * 1-5  node collectors/ratios-calculator.js
0 18 * * 1-5  node collectors/fred-daily.js

# Prediction market -- 15 min during market hours for active markets, hourly otherwise
*/15 9-16 * * 1-5  node collectors/prediction-markets-active.js
0 * * * *          node collectors/prediction-markets-hourly.js
0 8 * * 1-5        node collectors/prediction-markets-premarket.js

# Wednesday 10:30 AM ET: EIA petroleum report
30 10 * * 3   node collectors/eia-petroleum.js

# Thursday 10:30 AM ET: EIA natgas storage
30 10 * * 4   node collectors/eia-natgas.js

# Thursday 8:30 AM ET: Jobless claims + ADP Wednesday release
30 8 * * 4    node collectors/jobless-claims.js
# ADP: Wednesday 8:15 AM ET (2 days before NFP)
15 8 * * 3    node collectors/adp-employment.js   # run only on NFP week

# Friday 3:30 PM ET: CFTC COT release
30 15 * * 5   node collectors/cftc-cot.js

# Friday 1:00 PM ET: Baker Hughes rig count
0 13 * * 5    node collectors/baker-hughes.js

# Monday 4:00 PM ET: USDA crop progress (growing season Apr-Nov)
0 16 * * 1    node collectors/usda-crop-progress.js   # seasonal

# Monthly ~10th: WASDE release (12:00 PM ET)
30 12 8-15 * * node collectors/usda-wasde.js  # check daily state for is_wasde_day

# Monthly last day: China NBS PMI
0 9 28-31 * *  node collectors/china-pmi.js   # check daily state for is_china_pmi_day

# Weekly Sunday: Season pattern and Z-score recalculation
0 20 * * 0    node analytics/recalculate-zscores.js

# Weekly Friday: Brier score calibration
0 17 * * 5    node analytics/calibration-weekly.js

# On auction days: Treasury auction results (check daily state)
0 14 * * 1-5  node collectors/treasury-auction.js   # runs, checks if auction day

# Monthly: EIA STEO, DPR, OPEC MOMR, USDA exports, central bank gold
0 12 1-20 * * node collectors/monthly-data.js  # checks daily_state for each flag

STORAGE ESTIMATES (REVISED)

Data Category Daily Size Annual Size Notes
Futures prices + curves (20 contracts, 7 months each) ~150KB ~40MB
EIA petroleum + natgas + production ~50KB/week ~3MB
USDA WASDE + crop progress + exports ~100KB/week ~5MB
CFTC COT (15 contracts, 3 report types + CIT) ~250KB/week ~13MB
Treasury yields + auctions + Fed data ~50KB/day ~12MB
Spreads + ratios + Z-scores ~100KB/day ~25MB
Kalshi + Polymarket (OHLCV daily + intraday for active markets only) ~500KB/day ~125MB Revised down from 500MB -- store daily OHLCV for all, intraday only for high-conviction
Economic surprises + consensus ~10KB/week ~0.5MB
Weather outlooks + HDD/CDD ~20KB/day ~5MB
China data + Korean exports ~50KB/month ~1MB
Daily state table ~5KB/day ~1MB
Metals warehouse stocks ~20KB/day ~5MB
Macro series (FRED, credit, vol) ~50KB/day ~12MB
Alt data (AIS manual, rail, BDI, FBX) ~20KB/day ~5MB
Conviction scores + prediction log ~20KB/day ~5MB
Total ~258MB/year Comfortably within SQLite/VPS capacity

SQLite settings for production:

PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
PRAGMA cache_size=-64000;   -- 64MB cache
PRAGMA temp_store=MEMORY;

COST SUMMARY

Period Cost What You Get
Weeks 1-3 $0 All government data (EIA, USDA, CFTC, FRED, BLS, Treasury, NOAA), prediction market scrapers, full analytical engine, conviction scoring, data quality validation
Month 2+ $29/month Polygon.io (shared with Options desk) -- MANDATORY from Week 1, not Month 2
Month 3+ $0-49/month Quandl/Nasdaq Data Link for clean CFTC COT API if DIY parsing proves fragile
Year 1 Total ~$290-500 Complete Futures desk data infrastructure

The key insight: 90%+ of data that moves commodity markets is published free by US government agencies. Start with government data, use Polygon for price reliability, use Grok/Sonar web search to fill gaps while automated pipelines are being built.


BUILD PRIORITY (REVISED -- 16 Steps)

Step Component Effort Why First
1 SQLite schema + FRED API client 2-3 days Single API key unlocks yields, breakevens, SOFR, RRP, VIX, credit spreads, exchange rates, commodity spots, jobless claims. Highest-leverage single integration. Schema must be right on day 1.
2 Polygon.io futures price client 2-3 days Reliable prices are the foundation. Yahoo Finance WILL break within months. $29/month from day 1, not month 2.
3 EIA API client (petroleum + natgas) 3-4 days Wednesday petroleum and Thursday natgas are the two most market-moving weekly events. First edge signals. Add consensus collection to calculate surprise.
4 Kalshi + Polymarket API clients 1-2 days Execution venues. Reuse existing Kalshi Edge Bot code. Implement rate limiting and fallbacks from day 1.
5 Economic consensus collector 2-3 days MISSING from original spec. Consensus vs actual is the core signal for CPI/NFP/GDP Kalshi contracts. Scrape Investing.com calendar.
6 Daily state table + cron skeleton 2-3 days Traffic cop for the entire system. 5AM ET build, then triggers for Wed/Thu/Fri weekly events, monthly WASDE, etc.
7 CFTC COT parser + Z-score engine 3-5 days Positioning backbone. Use Quandl free tier for clean data. Calculate net positions, Z-scores, extreme flags, dealer flip detection for 15 contracts.
8 Cross-commodity ratios (core 6) 2-3 days Pure math from collected prices. Gold/silver, oil/natgas, copper/gold, soybean/corn, Brent-WTI, crack spread. Immediate signals.
9 Term structure builder 3-4 days Full curve for each commodity. Daily contango/backwardation classification, annualized roll yield, calendar spread percentile ranking.
10 USDA WASDE + crop progress parsers 3-4 days Agriculture equivalent of EIA weekly. Surprise calculator vs trade consensus.
11 Treasury auction scorer + Fed data 3-4 days Z-score auction results, FedWatch probability tracking, Atlanta Fed GDPNow, Cleveland CPI Nowcast, jobless claims, dot plot.
12 Data quality validation layer 2-3 days Add before deploying conviction scoring. ValidationResult for each source, fallback routing, staleness alerts.
13 Regime detection + conviction scoring 3-5 days Regime classifier first, then 6-component conviction scorer with regime-adjusted weights. Mandatory rejection rules. Trap flags.
14 China data collectors 2-3 days NBS PMI, GACC customs, SHFE stocks (via Investing.com/TradingView). High value for metals and energy demand signals.
15 Weather integration 2-3 days NOAA CPC, HDD/CDD, drought monitor, El Nino status, hurricane season tracking.
16 Calibration + backtesting setup 2-3 days Weekly Brier score calculation. Prediction log queries. Conviction score calibration against historical moves.

Defer to v2 (after core pipeline is stable):


CROSS-DESK SYNERGIES

Shared With Weather Desk

Shared With Options Desk

Shared With Stocks Desk

Shared With Forex Desk

Shared With Crypto Desk

Universal Infrastructure (All Desks)


TODO: Upgrade Kalshi REST to WebSocket Feed

Status: NOT BUILT — add when this desk goes live for execution

Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.

Why WebSocket matters:

Note: This desk primarily trades on other exchanges, but Kalshi has event markets (rate decisions, index levels, crypto milestones) that overlap with this desk. WebSocket upgrade applies when trading any Kalshi markets.

Added 2026-03-29 — upgrade REST to WS when desk moves to live execution

Source: ~/edgeclaw/results/spec-panel/futures-desk/spec-final.md