Grade: A-
Exceptionally thorough spec covering the full options data stack from raw chains to cross-market arbitrage. The main risks are operational — Yahoo Finance scraping fragility, SQLite write contention under 15-min polling for 30+ tickers, and the GEX calculation making a critical assumption about dealer directionality that is often wrong. The Form 424B2 NLP idea is genuinely novel but underestimates parsing difficulty by 5-10x.
Primary: Polygon.io (pay $29/month from Day 1 — do not attempt Yahoo Finance scraping for production use) Secondary/Fallback: OCC Daily Files (free, T+1, official) Tertiary: CBOE Delayed Quotes (15-min delay, more stable than Yahoo)
CRITICAL FLAW FIXED: Yahoo Finance has no official free API. The yfinance library and similar scrapers use Yahoo's internal endpoints which change without notice. Yahoo actively rate-limits and blocks scrapers. With 30+ tickers at 15-minute intervals, you will hit rate limits within days. Pay for Polygon.io from Week 1. The time spent debugging Yahoo scraping failures costs more than $29/month in lost productivity. Polygon's historical data also lets you backfill the IV rank database immediately rather than waiting 252 days.
Tier 1 — 15-minute intraday snapshots:
Tier 2 — Daily end-of-day snapshots:
[PANEL: DeepSeek V3.1] Sector rotation note: Options dealers hedge sector ETFs differently than the index. Knowing sector flows improves GEX accuracy for individual names. However, tracking full chains for XLF/XLE/XLK is redundant with SPY for most signals. MVP: pull sector ETF chains only for OI and P/C ratio, not full intraday snapshots.
ticker, expiration_date, strike, option_type (C/P),
bid, ask, mid, last_price, volume, open_interest,
implied_volatility, delta, gamma, theta, vega, rho,
underlying_price, timestamp, spread_pct
Calculated field: spread_pct = (ask - bid) / mid
MUST-HAVE addition (Opus): Track bid-ask spread as percentage of mid. Options with >20% spread are not practically tradeable. Flag illiquid contracts before routing to any analysis. This is especially important on Robinhood where execution quality is worse than direct exchange access.
[PANEL: Opus — MUST-HAVE] 0DTE options now represent 40-50% of total SPY/QQQ volume on many days. Their gamma impact is disproportionate because gamma explodes as expiration approaches. The GEX calculation will be wildly wrong if it treats 0DTE and 30DTE gamma equally. Separate 0DTE gamma from total GEX and track the ratio over time.
Filter: WHERE DTE = 0
Aggregation: separate pass on chain data, stored in options_chain_0dte table.
Calculate locally using Black-Scholes (calls/puts with continuous dividend yield q):
d1 = (ln(S/K) + (r - q + 0.5 * σ²) * T) / (σ * sqrt(T))
d2 = d1 - σ * sqrt(T)
Delta (call) = N(d1)
Delta (put) = N(d1) - 1
Gamma = N'(d1) / (S * σ * sqrt(T))
Theta (call) = -(S * N'(d1) * σ) / (2 * sqrt(T)) - r * K * exp(-r*T) * N(d2) + q * S * exp(-q*T) * N(d1)
Vega = S * exp(-q*T) * N'(d1) * sqrt(T)
Rho (call) = K * T * exp(-r*T) * N(d2)
Dividend adjustment: Always include q (continuous dividend yield). SPY and QQQ pay dividends. Ignoring this makes your forward price wrong and your Kalshi/Polymarket probability bridge mathematically incorrect.
[PANEL: Gemini 3.1 Pro — MUST-HAVE] Collect dividend yield and ex-dividend dates for all tracked tickers. Source: Polygon.io Reference Data API or Financial Modeling Prep free tier.
[PANEL: Gemini 3.1 Pro — MUST-HAVE] Collect corporate actions (splits, mergers, special dividends). Options chains adjust for corporate actions — if a stock splits, historical GEX and strike tracking will break instantly if not adjusted. Source: Polygon.io Reference Data API or OCC Information Memos.
Standard formula:
GEX = Sum over all strikes and expirations:
OI * Gamma * 100 * Spot * sign(option_type)
Where:
Call options contribute: +OI * Gamma * 100 * Spot (dealers short calls = short gamma on calls)
Put options contribute: -OI * Gamma * 100 * Spot (dealers short puts = long gamma on puts)
Net GEX positive = dealers long gamma (they buy dips, sell rips — market stabilizing)
Net GEX negative = dealers short gamma (they amplify moves — market destabilizing)
CRITICAL FLAW (Opus): The assumption "Call OI = dealers sold calls" is the standard retail assumption but is frequently wrong. Dealers are not always the seller. When a hedge fund sells a call to another hedge fund, the dealer may not be involved. When retail buys puts on Robinhood, the market maker (Citadel/Wolverine) is indeed short those puts — but institutional flow is bilateral.
Fix:
[PANEL: Grok 4.1 Fast] Sign convention: Net GEX = (Put OI * Gamma_put * 100 * Spot) - (Call OI * Gamma_call * 100 * Spot) for standard dealer short-gamma assumption. Validate explicitly against SqueezeMetrics or SpotGamma free samples.
Gamma flip level: Strike where Net GEX crosses zero. Below this level, dealers are short gamma (amplifying). Above it, dealers are long gamma (stabilizing).
0DTE GEX: Track separately. Calculate 0DTE GEX / Total GEX ratio. Days where 0DTE > 40% of GEX behave differently — gamma pin effects are compressed to single-session timeframe.
Vanna = dDelta/dVol = dVega/dS
Vanna = -N'(d1) * d2 / σ
Interpretation: When IV drops, delta changes. Vanna flow = dealer delta adjustments driven by vol changes. Positive vanna + falling VIX = dealers buy stock (supportive).
Charm = dDelta/dTime (delta decay)
Charm = -N'(d1) * (r - q + d1 * σ / (2 * sqrt(T))) / sqrt(T)
Interpretation: Delta changes as time passes even with no price move. Charm flow = systematic daily rehedging. End-of-day/end-of-week charm creates predictable directional pressure.
[PANEL: Sonar Reasoning Pro — MUST-HAVE] Calculate vega flip levels analogous to gamma flip levels. Vega flip = strike where aggregate dealer vega exposure crosses zero. When a dealer is long vega in front month and short vega in back months, their hedging creates a specific pattern. Track vega exposure across expirations, not just current IV surface.
Defer recommendation (Opus/Grok 4.1/DeepSeek): Vanna and Charm calculations are second-order signals. Build after core GEX and IV surface are stable and validated. Effort to build and maintain (daily recalculations, intraday triggers) outweighs predictive value for initial launch. However, include the table schema now so data is captured.
[PANEL: GPT-4.1 — MUST-HAVE] Implement a periodic validation routine to catch outliers or impossible values in chain data:
Run validation on every chain snapshot before inserting to database. Log all flagged records to data_quality_log table.
| Index | Description | Cadence |
|---|---|---|
| VIX | 30-day implied vol of S&P 500 | Every 15 min during market hours |
| VVIX | Vol of VIX (vol-of-vol) | Daily |
| SKEW | Tail risk (OTM put demand) | Daily |
| VIX3M | 3-month VIX | Daily |
| VIX6M | 6-month VIX | Daily |
VIX term structure: Track M1, M2, and the VIX futures month containing the next FOMC/CPI event. Do not track all 8-9 listed months (over-engineered, per Opus).
Contango/backwardation: M2 / M1 - 1. Positive = contango (normal, carry positive for short vol). Negative = backwardation (stress, hedging demand elevated).
Endpoints:
https://www.cboe.com/tradable_products/vix/vix_historical_data/ (delayed CSV)https://cdn.cboe.com/products/us/futures/data/ (daily settlement)https://www.cboe.com/tradable_products/vix/vvix_historical_data/https://www.cboe.com/tradable_products/vix/skew_historical_data/Critical gap in original spec: IV percentile alone is meaningless without knowing what vol actually realized. Build all three estimators:
Close-to-Close (simplest):
RV_cc = sqrt(252 / n * Sum(log(Ci / Ci-1)²))
Parkinson (High-Low):
RV_park = sqrt(252 / (4 * ln(2) * n) * Sum(log(Hi/Li)²))
Yang-Zhang (full OHLC, most accurate):
RV_yz = sqrt(252 * (σ_open² + k * σ_close² + (1-k) * σ_rs²))
Where k = 0.34 / (1.34 + (n+1)/(n-1)), using Rogers-Satchell variance
Windows: Calculate all three estimators at 5, 10, 20, 30, 60 trading day windows.
Source: Calculate from OHLC price data (Yahoo Finance historical is fine for daily OHLC — historical data is stable, intraday chain data is where Yahoo fails). Polygon.io when subscribed.
Variance Risk Premium (VRP):
VRP = VIX² / 252 - RV_20d² (annualized, in vol units: VRP = VIX - RV_20d)
Positive VRP = selling vol is profitable on average. VRP regime shifts predict when vol selling stops working. This is the single most robust predictor of short-term option returns in academic literature.
Build incrementally from daily chain snapshots. Need at least 252 trading days of IV history per ticker to calculate:
(Current IV - 52w Low IV) / (52w High IV - 52w Low IV) — where current IV sits within the year range% of days in past 252 where IV was lower than todayStore daily ATM IV per ticker in iv_history table. Start collecting on Day 1 — this table self-populates over time.
Backfill: CBOE historical data (free for VIX, paid for individual stocks) or ivolatility.com (limited free). Polygon.io has historical options data — use to backfill on subscription.
Simplified approach (Opus recommendation): For prediction market arbitrage, you only need:
Full IV surface at 7 delta points × 9 expirations is a quant desk project. Build the simplified version first.
IV Surface Storage: Store IV at 5-delta points (10d, 25d, 50d ATM, 75d, 90d) per expiration per ticker. Interpolate with cubic spline when needed for specific strikes.
Purpose: Isolate the vol priced specifically for an upcoming event (earnings, FOMC, CPI) by comparing IV across expirations that bracket the event.
Formula:
Forward vol for event window [T1, T2]:
σ_fwd = sqrt((σ_T2² * T2 - σ_T1² * T1) / (T2 - T1))
Where T1 is the expiration just before the event and T2 is the first expiration after.
Event vol = σ_fwd minus expected non-event vol (estimated from non-event windows of similar length).
Pull at 8 AM ET pre-market. Tracks European equity vol — leads VIX when European markets react to overnight news before US opens.
Caveat (Opus): VSTOXX leads VIX primarily due to time zone mechanics, not predictive alpha. By 8 AM ET, US futures have already incorporated the same overnight news. The actual lead time where VSTOXX contains information not yet in US futures is 0-15 minutes, not 30-90. Use VSTOXX/VIX spread instead: when European vol is unusually high relative to US vol, it signals a European-specific risk that hasn't fully transmitted.
Data source: Eurex delayed data (30-min delay on free feeds). Alternative: use VIX futures that trade 24/5 on CBOE — highly correlated to VSTOXX and available real-time for free.
Endpoint: https://www.stoxx.com/indices-data?isin=EU0009658152 (scrape) or Eurex delayed data feed.
[PANEL: Opus — NICE-TO-HAVE] CBOE implied correlation index (COR1M) for SPX constituent correlation. When implied correlation is high, single-stock options are cheap relative to index options (dispersion trade opportunity). Realized correlation calculated from price data already collected.
Source: https://www.cboe.com/tradable_products/vix/implied_correlation/ (free)
Base URL: https://api.stlouisfed.org/fred/series/observations
Auth: Free API key from fred.stlouisfed.org
Cadence: Daily at 6 PM ET (most series update by then)
| Series ID | Description | Frequency |
|---|---|---|
| SOFR | Secured Overnight Financing Rate | Daily |
| DGS2 | 2-Year Treasury Yield | Daily |
| DGS10 | 10-Year Treasury Yield | Daily |
| DGS30 | 30-Year Treasury Yield | Daily |
| T10Y2Y | 10Y-2Y Yield Curve Spread | Daily |
| BAMLH0A0HYM2 | HY Credit Spread (ICE BofA) | Daily (T+1 lag) |
| BAMLC0A0CM | IG Credit Spread (ICE BofA) | Daily (T+1 lag) |
| RRPONTSYD | Fed Reverse Repo (RRP) | Daily |
| WTREGEN | Treasury General Account (TGA) | Weekly |
| WALCL | Fed Balance Sheet Total Assets | Weekly |
FRED credit spread lag caveat (Sonar Reasoning Pro): FRED publishes at 5 PM ET the following day. Use bond ETF prices (HYG, JNK for HY; LQD for IG) for intraday real-time spread approximation. Bond ETFs update intraday and are free from Yahoo Finance historical or Polygon.io.
Real-time HY spread proxy:
HY_spread_intraday ≈ JNK_yield - DGS10_current
JNK_yield = (annual coupon / JNK_price) * 100 (simplified; use OAS when available)
Critical gap: The spec uses Kalshi for Fed rate probabilities but ignores the deepest, most liquid source: CME Fed Funds futures. Every institutional trader uses these as the benchmark. Comparing CME implied probabilities to Kalshi prices is a more robust arbitrage signal than using options-derived probabilities.
Source: CME FedWatch tool (free website scrape) at https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html
Alternative: FRED Fed Funds futures data (series FFXXX where XXX is contract month)
Cadence: Every 15 minutes during market hours (FedWatch updates continuously)
Calculation:
Implied prob of rate change = (100 - Fed Funds futures price) / 100
Implied prob of hold = 1 - prob of change
Arbitrage signal: CME_implied_hold_prob - Kalshi_hold_price. If >5c, Kalshi is mispriced. If Polymarket has the same contract, compare all three.
Source: https://www.treasurydirect.gov/TA_WS/securities/search (JSON API, free)
Cadence: Pull daily, filter for auctions in past 7 days
Fields: auction_date, security_type, term, bid_to_cover, indirect_bidders_pct, high_yield, when_issued_spread
Signal: Weak bid-to-cover (<2.3 for 10Y) or high tail (high_yield > when_issued by >1bp) signals Treasury demand concern → rates vol spikes → impacts rate-sensitive options.
Source: ICE (paid) or scrape from financial data providers Cadence: Daily
Opus recommendation: Replace MOVE tracking with 2Y/10Y yield spread (already in FRED pull). Treasury vol (MOVE) leads equity options less strongly than yield curve slope does. Keep it simple: pull T10Y2Y from FRED, calculate daily slope change. Drop MOVE unless you find a free reliable source.
[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Treasury futures-to-bond basis (TYX spread). When Treasury futures and cash bonds disconnect, it reprices the entire SOFR curve, which cascades into rate-sensitive options. Calculate from 10Y futures (free CME delayed) vs 10Y cash yield from FRED. Signal: basis widening → rate vol about to spike.
Source: https://www.cftc.gov/dea/options/deaoiall.htm (free weekly CSV)
Cadence: Every Friday after 3:30 PM ET (data as of Tuesday)
Fields: market_name, noncomm_positions_long, noncomm_positions_short, comm_positions_long, comm_positions_short, nonrept_positions_long, nonrept_positions_short
Signal: Extreme non-commercial (speculative) positioning in VIX futures or equity index options predicts mean reversion.
Source: https://www.theocc.com/market-data/market-data-reports/volume-and-open-interest/ (free daily CSV)
Cadence: Pull at 8 AM ET (data from prior day's close, usually published 7-8 AM)
Fields: exchange, underlying, expiration, strike, call_put, volume, open_interest
Timing note (Opus): OCC data is T+1. Use yesterday's OCC OI as the base. Estimate today's OI changes from intraday volume data (volume where volume > prior OI suggests new positions). Update GEX with confirmed OCC data next morning.
Calculate from chain data or OCC files:
SELECT
underlying,
date,
SUM(CASE WHEN option_type = 'P' THEN volume ELSE 0 END) AS put_volume,
SUM(CASE WHEN option_type = 'C' THEN volume ELSE 0 END) AS call_volume,
CAST(SUM(CASE WHEN option_type = 'P' THEN volume ELSE 0 END) AS REAL) /
NULLIF(SUM(CASE WHEN option_type = 'C' THEN volume ELSE 0 END), 0) AS put_call_ratio,
SUM(CASE WHEN option_type = 'P' THEN open_interest ELSE 0 END) AS put_oi,
SUM(CASE WHEN option_type = 'C' THEN open_interest ELSE 0 END) AS call_oi,
CAST(SUM(CASE WHEN option_type = 'P' THEN open_interest ELSE 0 END) AS REAL) /
NULLIF(SUM(CASE WHEN option_type = 'C' THEN open_interest ELSE 0 END), 0) AS put_call_oi_ratio
FROM options_chain_daily
GROUP BY underlying, date;
Signals:
Criteria for flagging a contract as unusual:
volume_to_oi_ratio > 5.0 AND volume > 1000 AND option_type in ('C', 'P')
For real-time sweep detection (requires Polygon.io):
Sweep fields to store:
ticker, strike, expiration, option_type, print_size_contracts, print_price,
ask_at_time, bid_at_time, aggressor (buyer/seller), timestamp, is_sweep (bool)
Purpose: Detect when large OI positions are closed all at once — signals forced liquidation or major position exit.
Detection rule (improved per Sonar Reasoning Pro): The original rule (|OI change| > 2x volume = liquidation) breaks during big moves. The improvement: track OI change separately for calls vs puts. Only flag as liquidation if BOTH calls and puts are dropping in the same direction. If calls drop but puts rise, that is a rehedge, not liquidation.
SELECT
ticker, date,
call_oi_change,
put_oi_change,
CASE
WHEN call_oi_change < -0.15 AND put_oi_change < -0.15 THEN 'LIQUIDATION'
WHEN call_oi_change < -0.15 AND put_oi_change > 0.05 THEN 'CALL_REHEDGE'
WHEN put_oi_change < -0.15 AND call_oi_change > 0.05 THEN 'PUT_REHEDGE'
ELSE 'NORMAL'
END AS position_change_type
FROM (
SELECT
ticker,
date,
(today_call_oi - yesterday_call_oi) * 1.0 / NULLIF(yesterday_call_oi, 0) AS call_oi_change,
(today_put_oi - yesterday_put_oi) * 1.0 / NULLIF(yesterday_put_oi, 0) AS put_oi_change
FROM oi_daily_summary
);
Source: https://www.finra.org/sites/default/files/short-sale-volume-files/ (free daily CSV)
Cadence: Pull at 6 PM ET (data published T+1, so today's file has yesterday's data)
Format: FINRA_<EXCHANGE>_<DATE>_ShortVolume.txt
Fields: Date, Symbol, ShortVolume, ShortExemptVolume, TotalVolume
Signal: When short volume exceeds 50% of total volume on a stock with rising call OI, it signals delta hedging by market makers (they short stock to hedge long delta from sold puts). This directly validates or contradicts the GEX model.
[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Listed Short Volume Reporting from SEC/exchange websites gives the same signal 1-2 days earlier than FINRA ATS (which lags 2-3 days for threshold list). Source: sec.gov or exchange daily files.
Source: https://efts.sec.gov/LATEST/search-index?q=%22Form+4%22&dateRange=custom&startdt={date}&enddt={date}&forms=4
Cadence: Daily at 6 PM ET
Parse fields: issuer_name, ticker, reporting_person, transaction_date, shares, price_per_share, transaction_code (P=purchase, S=sale), ownership_type
Signal: Cluster of insider buys at specific price levels can identify where smart money sees support. Form 4 is well-structured XML — easy to parse.
Purpose: Reveals exactly where banks are hedged via structured notes (autocallables, barrier reverse convertibles). These barrier levels create dealer hedging flows that produce gamma walls.
Source: https://efts.sec.gov/LATEST/search-index?q=%22424B2%22&forms=424B2
Cadence: Daily — new filings appear continuously during business hours
MVP approach (Sonar Reasoning Pro recommendation):
Regex extraction for MVP (Opus guidance):
patterns = {
'barrier': r'barrier.*?(\d+\.?\d*)\s*%',
'notional': r'notional.*?\$(\d+(?:,\d+)*(?:\.\d+)?)\s*(million|billion)',
'underlying': r'linked to.*?(SPY|SPX|QQQ|NDX|[A-Z]{1,4})',
'maturity': r'maturity.*?(\d{1,2}/\d{1,2}/\d{4})',
'issuer': r'(Goldman Sachs|JPMorgan|Morgan Stanley|Bank of America|Barclays)',
}
Long-term (Month 2-3): Full NLP parsing. These filings are 50-200 pages of dense legal text. Regex covers ~60% of cases. Full automation requires fine-tuned models or substantial engineering. Start with regex, iterate.
Source: https://www.nasdaqtrader.com/trader.aspx?id=regsho (free daily)
Cadence: Daily at close
Signal: Stocks appearing on REG SHO threshold list (failed to deliver for 5+ consecutive days) have short squeeze potential. Combine with put/call OI to identify squeeze setups.
Volume-synchronized probability of informed trading. Detects informed vs uninformed trading in options/equities — can predict short-term vol spikes or reversals, enhancing phantom liquidation and sweep detection.
Calculation: Requires intraday volume and price bars (Polygon.io). VPIN = |Buy volume - Sell volume| / Total volume, measured in equal-volume buckets.
Defer to v2 until Polygon.io subscription is active and intraday bars are being collected.
Base URL: https://api.elections.kalshi.com/trade-api/v2
Auth: RSA-PSS signing with API key (existing code from weather desk — adapt)
Cadence: Every 15 minutes during market hours; every 60 minutes overnight
Markets to track (options-relevant):
Fields per market:
market_ticker, event_ticker, title, status, yes_bid, yes_ask, no_bid, no_ask,
last_price, volume, open_interest, close_time, expiration_time, result, timestamp
[PANEL: Sonar Reasoning Pro — MUST-HAVE] Capture full order book depth, not just midpoint prices. A Kalshi market might trade at 45 cents, but if there's only $500 at 45, it is illiquid and the price is a lie. Kalshi provides full order book via API (free). Store: bid_price_1..5, bid_size_1..5, ask_price_1..5, ask_size_1..5 for the top 5 levels.
-- Kalshi liquidity calculation
SELECT
market_ticker,
timestamp,
yes_ask - yes_bid AS spread_cents,
bid_size_1 + bid_size_2 + bid_size_3 AS top3_bid_depth_usd,
ask_size_1 + ask_size_2 + ask_size_3 AS top3_ask_depth_usd,
CASE WHEN (bid_size_1 + bid_size_2 + bid_size_3) < 500 THEN 1 ELSE 0 END AS illiquid_flag
FROM kalshi_orderbook
WHERE timestamp > datetime('now', '-15 minutes');
GraphQL endpoint: https://clob.polymarket.com/ (REST) and TheGraph subgraph
Cadence: Every 15 minutes
Fields: condition_id, question, end_time, outcome_prices (yes/no), volume_24h, liquidity_usd
[PANEL: Sonar Reasoning Pro — MUST-HAVE] Polymarket uses an AMM (automated market maker) with USDC collateral. AMM slippage is not captured by price alone. Monitor:
USDC depeg risk (Opus fix): Better leading indicators than Deribit for Polymarket liquidity:
https://api.coingecko.com/api/v3/simple/price?ids=usd-coin,tether&vs_currencies=usd[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Cross-listing arbitrage: SPY vs SPX, QQQ vs NDX. When SPY calls are 2% cheaper than the synthetic SPX equivalent, that is detectable alpha. Calculate parity between chain data from both, flag deviations >1%.
Purpose: Convert options chain data to a probability of the underlying finishing above/below a specific strike by a specific date. Compare to Kalshi/Polymarket prices on the same event.
Standard approach (d2 from Black-Scholes):
P(S_T > K) = N(d2)
Where d2 = (ln(S/K) + (r - q - 0.5*σ²) * T) / (σ * sqrt(T))
CRITICAL FLAW (Opus + Sonar Reasoning Pro): Standard d2 assumes European options and no early exercise. US equity options are American (can exercise early). Use the following instead:
Improved approach — Breeden-Litzenberger (model-free):
Risk-neutral density: f(K) = exp(r*T) * d²C/dK²
Binary probability: P(S_T > K) = -exp(r*T) * dC/dK
Compute numerically from the options chain by taking finite differences between strikes.
Alternative — Bjerksund-Stensland (closed-form for American options, Sonar Reasoning Pro): Use Bjerksund-Stensland approximation for American exercise and dividend adjustment. The difference vs standard d2 can be 2-5% on your arbitrage signal — which is your entire edge vs Kalshi.
Test protocol: Pick any live SPY contract. Calculate probability using standard d2. Calculate using Bjerksund-Stensland. Compare both to actual Kalshi contract price. The one that better explains the Kalshi price is your truth test.
Use skew-adjusted IV (Opus): Do not use flat ATM IV in d2. Use the actual IV at each specific strike from the IV surface. This gives the risk-neutral probability that already incorporates skew.
Arbitrage threshold: Flag when |options_probability - kalshi_price| > 5c AND kalshi_liquidity_top3 > $300. Do the same comparison for Polymarket.
Central lookup table queried by all downstream analytics. Build this first — everything else plugs into it.
CREATE TABLE IF NOT EXISTS daily_state (
date TEXT PRIMARY KEY,
-- Market structure
is_trading_day INTEGER DEFAULT 1,
is_monthly_opex INTEGER DEFAULT 0, -- 3rd Friday of month
is_quarterly_opex INTEGER DEFAULT 0, -- 3rd Friday of Mar/Jun/Sep/Dec
is_weekly_opex INTEGER DEFAULT 0,
is_0dte_day INTEGER DEFAULT 0,
days_to_monthly_opex INTEGER,
-- Fed calendar
is_fomc_day INTEGER DEFAULT 0,
is_fomc_week INTEGER DEFAULT 0,
days_to_fomc INTEGER,
fomc_expected_action TEXT,
-- Economic calendar
is_cpi_day INTEGER DEFAULT 0,
is_ppi_day INTEGER DEFAULT 0,
is_nfp_day INTEGER DEFAULT 0,
is_gdp_day INTEGER DEFAULT 0,
is_retail_sales_day INTEGER DEFAULT 0,
is_pce_day INTEGER DEFAULT 0,
days_to_cpi INTEGER,
days_to_nfp INTEGER,
-- Treasury calendar
is_2y_auction INTEGER DEFAULT 0,
is_10y_auction INTEGER DEFAULT 0,
is_30y_auction INTEGER DEFAULT 0,
is_quarter_end INTEGER DEFAULT 0,
is_month_end INTEGER DEFAULT 0,
-- Vol state (filled after market close)
vix_close REAL,
vix_regime TEXT, -- 'low' <15, 'normal' 15-20, 'elevated' 20-30, 'stress' >30
vix_pct_change_1d REAL,
vvix_close REAL,
skew_close REAL,
vrp_20d REAL,
-- Market state
spy_close REAL,
spy_pct_change_1d REAL,
spy_pct_change_5d REAL,
net_gex REAL,
gex_regime TEXT, -- 'positive' or 'negative'
gamma_flip_level REAL,
-- Macro state
sofr REAL,
yield_2y REAL,
yield_10y REAL,
yield_curve_2y10y REAL,
hy_spread REAL,
ig_spread REAL,
rrp_balance_bn REAL,
tga_balance_bn REAL,
-- Prediction market state
kalshi_fed_hold_prob REAL,
cme_fedwatch_hold_prob REAL,
kalshi_cme_divergence REAL,
-- Derived flags
convergence_pattern TEXT, -- e.g., 'CPI+OPEX', 'FOMC+QUAD_WITCH'
buyback_blackout_pct REAL, -- estimated % of S&P500 in blackout window (±10% uncertainty)
-- Updated timestamps
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_daily_state_date ON daily_state(date);
CREATE INDEX IF NOT EXISTS idx_daily_state_opex ON daily_state(is_monthly_opex, is_quarterly_opex);
CREATE INDEX IF NOT EXISTS idx_daily_state_fomc ON daily_state(is_fomc_day, is_fomc_week);
Buyback blackout estimate (Opus): Assume all companies enter blackout 14 calendar days before their earnings date and exit 2 days after. Calculate market-cap-weighted percentage. Label as estimate with ±10% uncertainty.
[PANEL: Sonar Reasoning Pro — NICE-TO-HAVE] Systematic rebalancer timing: Track known rebalance dates (FTSE quarterly, Russell annual in June, S&P 500 quarterly in March/June/September/December), factor rotation dates, and ETF creation/destruction. These create predictable volatility spikes 2-5 days before and 1-2 days after. Add flags to daily_state for these events.
MVP (Opus/Sonar Reasoning Pro): Hardcode alerts for the top 2 patterns that have the most documented precedent. Run Month 2 analysis on which patterns actually matter, then generalize.
Top 2 convergence patterns:
Additional patterns to track (lower priority):
UPDATE daily_state
SET convergence_pattern = CASE
WHEN is_cpi_day = 1 AND is_weekly_opex = 1 THEN 'CPI+OPEX'
WHEN is_quarterly_opex = 1 AND (SELECT COUNT(*) FROM earnings_calendar WHERE date BETWEEN date('now', '-3 days') AND date('now', '+3 days')) > 20 THEN 'QUAD_WITCH+EARNINGS'
WHEN is_fomc_week = 1 AND buyback_blackout_pct > 0.40 THEN 'FOMC+BLACKOUT'
WHEN vix_regime = 'stress' AND gex_regime = 'negative' THEN 'STRESS+SHORT_GAMMA'
ELSE NULL
END
WHERE date = date('now');
Source: Yahoo Finance earnings calendar (scrape), or Nasdaq.com, or Polygon.io (paid) Cadence: Update weekly; refresh daily for any changes
CREATE TABLE IF NOT EXISTS earnings_calendar (
ticker TEXT NOT NULL,
earnings_date TEXT NOT NULL,
time_of_day TEXT, -- 'BMO' (before market open), 'AMC' (after market close), 'DMH' (during market hours)
expected_eps REAL,
expected_revenue_bn REAL,
actual_eps REAL,
actual_revenue_bn REAL,
options_implied_move_pct REAL, -- calculated from straddle price / stock price
historical_avg_move_pct REAL,
historical_move_beat_rate REAL, -- % of times actual move > implied move
post_earnings_direction TEXT, -- 'up', 'down', 'flat' (filled after event)
PRIMARY KEY (ticker, earnings_date)
);
CREATE INDEX IF NOT EXISTS idx_earnings_ticker ON earnings_calendar(ticker);
CREATE INDEX IF NOT EXISTS idx_earnings_date ON earnings_calendar(earnings_date);
Source: Reddit PRAW API (free, rate-limited) Subreddits: r/options (primary), r/wallstreetbets (secondary, noisy) Cadence: Daily at market close — aggregate daily, not real-time
Simplified approach (Opus): Only r/options (more serious). Only flag when a single ticker's mention count exceeds 3 standard deviations from its 30-day average. Drop the YOLO frequency metric from r/wallstreetbets entirely.
[PANEL: Sonar Reasoning Pro] Reddit sentiment + options flow correlation matters more than sentiment alone. Build a confusion matrix:
Only use as contrarian when Reddit and options flow DIVERGE.
CREATE TABLE IF NOT EXISTS reddit_sentiment_daily (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
subreddit TEXT NOT NULL,
mention_count INTEGER,
bullish_count INTEGER,
bearish_count INTEGER,
sentiment_score REAL, -- (bullish - bearish) / total
is_3sigma_spike INTEGER DEFAULT 0,
avg_30d_mentions REAL,
std_30d_mentions REAL,
PRIMARY KEY (date, ticker, subreddit)
);
Source: X API (paid at meaningful scale) or Grok search Cadence: Once daily at market open — consensus themes only, not individual trade alerts
Simplified approach (Opus): 3x daily scans is over-engineered and these accounts post hundreds of alerts daily, mostly noise. The actual large institutional flows are better captured by volume/OI analysis. Simplify to: one daily scan for consensus themes (e.g., "everyone is talking about NVDA put buying"), not individual trade alerts.
[PANEL: Gemini 3.1 Pro] Remove X/Twitter flow account aggregation from v1. The signal is noisier than r/options, and the API cost at meaningful scale is prohibitive. Add back in v2 if there is evidence of signal.
Recommended (Opus): Remove SGX Nifty from tracking. India's Nifty has near-zero predictive value for US options markets. The correlation is driven entirely by shared global risk factors better captured by S&P 500 futures (which trade 23 hours/day) and VIX futures.
Keep:
JPY and CHF as safe-haven flows:
Source: FRED (DEXJPUS — free daily), or forex data providers for intraday Cadence: Every 15 minutes during market hours for intraday; daily for historical
[PANEL: Sonar Reasoning Pro — cross-desk shared] Carry trade indicator is reusable for Forex desk (directly) and Crypto desk (BTC/ETH safe havens). Build as shared utility.
Source: Deribit WebSocket API (free, real-time)
Endpoint: wss://www.deribit.com/ws/api/v2
Subscriptions: BTC and ETH options — ATM IV, 25-delta skew, term structure
Cadence: Real-time WebSocket updates; store hourly snapshots
Purpose for options desk:
Validation approach: Compare your Black-Scholes Greeks calculation against Deribit's published Greeks for BTC/ETH options. If your delta is within 0.5% of Deribit's, the math is correct. Note (Sonar Reasoning Pro): Deribit pre-calculates Greeks using their own model assumptions — validate against multiple sources including academic formulas and community tools (OptionStrat), not Deribit alone.
[PANEL: Grok 4.1] Historical options chain data for backtesting is a critical gap. Without historical data, model validation is limited to forward testing, delaying learning cycles. Deribit provides free historical data download for BTC/ETH — use this to backtest GEX and binary probability calculations before applying to equities.
Source: ETF.com (free scrape), ICI weekly flow reports (free), SPY/QQQ shares outstanding from Yahoo Finance Cadence: Weekly (ICI), daily for shares outstanding
[PANEL: Opus/DeepSeek/GPT-4.1 — NICE-TO-HAVE] ETF creation/redemption flows interact with options gamma. If SPY is near a gamma wall AND seeing large redemptions, the wall may not hold. ICI publishes weekly; ETF.com has more timely data.
Large creation = inflows = buying pressure. Large redemption = outflows.
Signal: SPY shares outstanding decrease > 1% in one day = significant redemption = potential gamma wall failure signal.
CREATE TABLE IF NOT EXISTS etf_flows (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
shares_outstanding INTEGER,
shares_change INTEGER,
flow_direction TEXT, -- 'creation' or 'redemption'
estimated_flow_usd_mm REAL,
PRIMARY KEY (date, ticker)
);
Source: Yahoo Finance (add dividend_date and dividend_amount fields), Nasdaq.com dividend calendar Cadence: Weekly update
[PANEL: Opus — MUST-HAVE] Early exercise of American calls is rational only when the dividend exceeds the remaining time value. GEX model assumes OI persists, but large OI in ITM calls will vanish the day before ex-div due to early exercise. Also, put-call parity requires dividend adjustment — without it, the binary probability calculator will be systematically biased for dividend-paying stocks.
CREATE TABLE IF NOT EXISTS dividend_calendar (
ticker TEXT NOT NULL,
declaration_date TEXT,
ex_dividend_date TEXT NOT NULL,
record_date TEXT,
payment_date TEXT,
dividend_amount REAL,
frequency TEXT, -- 'quarterly', 'monthly', 'annual', 'special'
PRIMARY KEY (ticker, ex_dividend_date)
);
CREATE INDEX IF NOT EXISTS idx_div_ex_date ON dividend_calendar(ex_dividend_date);
Flag in daily_state: Add tickers_with_exdiv_tomorrow TEXT (comma-separated) — any tracked ticker going ex-div tomorrow needs GEX recalculation warning.
Implement explicit data validation and cleaning before inserting to any table. This is critical — bad data will silently poison all downstream analytics.
CREATE TABLE IF NOT EXISTS data_quality_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL, -- 'polygon', 'occ', 'kalshi', etc.
table_name TEXT NOT NULL,
record_key TEXT, -- ticker + date + strike, etc.
issue_type TEXT NOT NULL, -- 'impossible_delta', 'negative_iv', 'stale_data', etc.
issue_description TEXT,
raw_value TEXT,
flagged_at TEXT DEFAULT (datetime('now')),
resolved INTEGER DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_dq_source ON data_quality_log(source, flagged_at);
CREATE INDEX IF NOT EXISTS idx_dq_unresolved ON data_quality_log(resolved) WHERE resolved = 0;
Validation rules per source:
[PANEL: Sonar Reasoning Pro] Explicit data quality SLAs per source. Flag immediately when data is stale:
CREATE TABLE IF NOT EXISTS data_freshness (
source TEXT PRIMARY KEY,
last_successful_update TEXT,
expected_update_cadence_minutes INTEGER,
max_staleness_minutes INTEGER,
is_stale INTEGER DEFAULT 0,
consecutive_failures INTEGER DEFAULT 0
);
Staleness rules:
If SPY chain data is >30 minutes stale: Flag GEX calculation as unreliable rather than silently using stale data.
Each data collector must implement:
[PANEL: GPT-4.1] If the VPS fails, data is lost. Add regular off-server backups.
Archival policy (Opus):
Backup:
All tables use WAL mode. Multiple databases to eliminate cross-domain write contention (Opus recommendation). Stagger scraper schedules by 1-2 minutes to avoid simultaneous writes.
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -64000; -- 64MB cache
PRAGMA temp_store = MEMORY;
-- Intraday chain snapshots (15-minute)
CREATE TABLE IF NOT EXISTS options_chain_intraday (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ticker TEXT NOT NULL,
timestamp TEXT NOT NULL,
expiration_date TEXT NOT NULL,
strike REAL NOT NULL,
option_type TEXT NOT NULL CHECK (option_type IN ('C', 'P')),
bid REAL,
ask REAL,
mid REAL,
last_price REAL,
volume INTEGER DEFAULT 0,
open_interest INTEGER DEFAULT 0,
implied_volatility REAL,
delta REAL,
gamma REAL,
theta REAL,
vega REAL,
rho REAL,
underlying_price REAL,
spread_pct REAL, -- (ask - bid) / mid
dte INTEGER, -- days to expiration
is_0dte INTEGER DEFAULT 0,
data_source TEXT DEFAULT 'polygon'
);
CREATE INDEX IF NOT EXISTS idx_chain_intraday_ticker_ts ON options_chain_intraday(ticker, timestamp);
CREATE INDEX IF NOT EXISTS idx_chain_intraday_expiry ON options_chain_intraday(ticker, expiration_date);
CREATE INDEX IF NOT EXISTS idx_chain_intraday_0dte ON options_chain_intraday(is_0dte, timestamp);
-- Daily end-of-day chain snapshots (from OCC or Polygon EOD)
CREATE TABLE IF NOT EXISTS options_chain_daily (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL,
ticker TEXT NOT NULL,
expiration_date TEXT NOT NULL,
strike REAL NOT NULL,
option_type TEXT NOT NULL CHECK (option_type IN ('C', 'P')),
volume INTEGER DEFAULT 0,
open_interest INTEGER DEFAULT 0,
implied_volatility REAL,
delta REAL,
gamma REAL,
theta REAL,
vega REAL,
rho REAL,
bid_close REAL,
ask_close REAL,
last_price REAL,
underlying_close REAL,
spread_pct REAL,
dte INTEGER,
data_source TEXT DEFAULT 'occ'
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_chain_daily_unique ON options_chain_daily(date, ticker, expiration_date, strike, option_type);
CREATE INDEX IF NOT EXISTS idx_chain_daily_ticker ON options_chain_daily(ticker, date);
-- GEX calculations (stored after each computation)
CREATE TABLE IF NOT EXISTS gex_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ticker TEXT NOT NULL,
timestamp TEXT NOT NULL,
net_gex REAL,
call_gex REAL,
put_gex REAL,
gamma_flip_level REAL,
gamma_wall_upper REAL,
gamma_wall_lower REAL,
zero_gamma_level REAL,
gex_0dte REAL,
gex_0dte_pct REAL, -- 0DTE as % of total GEX
customer_origin_confidence REAL, -- 0-1, how confident we are in dealer directionality
gex_regime TEXT, -- 'positive', 'negative'
underlying_price REAL
);
CREATE INDEX IF NOT EXISTS idx_gex_ticker_ts ON gex_snapshots(ticker, timestamp);
-- IV history for IV rank and percentile calculation
CREATE TABLE IF NOT EXISTS iv_history (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
atm_iv REAL,
iv_25d_call REAL, -- 25-delta call IV
iv_25d_put REAL, -- 25-delta put IV
iv_10d_call REAL,
iv_10d_put REAL,
skew_25d REAL, -- iv_25d_put - iv_25d_call
iv_term_1m REAL, -- front month ATM IV
iv_term_2m REAL,
iv_term_3m REAL,
iv_term_6m REAL,
iv_rank_52w REAL, -- (current - 52w low) / (52w high - 52w low)
iv_pct_252d REAL, -- % of last 252 days with lower IV
PRIMARY KEY (date, ticker)
);
-- Vanna and Charm aggregate flows
CREATE TABLE IF NOT EXISTS second_order_greeks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
ticker TEXT NOT NULL,
timestamp TEXT NOT NULL,
aggregate_vanna REAL, -- sum across all strikes/expirations
aggregate_charm REAL,
vanna_flip_level REAL, -- strike where aggregate vanna crosses zero
charm_daily_impact REAL, -- estimated delta change from charm over next 24h
vega_flip_level REAL, -- [PANEL: Sonar] strike where aggregate vega crosses zero
aggregate_vega_front REAL, -- [PANEL: Sonar] front-month total vega exposure
aggregate_vega_back REAL -- [PANEL: Sonar] back-month total vega exposure
);
-- Put/call ratios
CREATE TABLE IF NOT EXISTS put_call_ratios (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
put_volume INTEGER,
call_volume INTEGER,
put_call_volume_ratio REAL,
put_oi INTEGER,
call_oi INTEGER,
put_call_oi_ratio REAL,
is_volume_extreme INTEGER DEFAULT 0, -- 1 if ratio > 1.2 or < 0.6
PRIMARY KEY (date, ticker)
);
-- Sweep/unusual activity log
CREATE TABLE IF NOT EXISTS options_sweeps (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
ticker TEXT NOT NULL,
expiration_date TEXT NOT NULL,
strike REAL NOT NULL,
option_type TEXT NOT NULL,
print_size_contracts INTEGER,
print_price REAL,
ask_at_time REAL,
bid_at_time REAL,
aggressor TEXT, -- 'buyer', 'seller', 'unknown'
is_sweep INTEGER DEFAULT 0,
volume_to_oi REAL,
sentiment TEXT -- 'bullish', 'bearish'
);
CREATE INDEX IF NOT EXISTS idx_sweeps_ticker_ts ON options_sweeps(ticker, timestamp);
-- Phantom liquidation events
CREATE TABLE IF NOT EXISTS phantom_liquidation_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL,
ticker TEXT NOT NULL,
call_oi_change_pct REAL,
put_oi_change_pct REAL,
event_type TEXT, -- 'LIQUIDATION', 'CALL_REHEDGE', 'PUT_REHEDGE', 'NORMAL'
underlying_move_pct REAL,
notes TEXT
);
-- VIX family daily data
CREATE TABLE IF NOT EXISTS vix_daily (
date TEXT PRIMARY KEY,
vix_open REAL,
vix_high REAL,
vix_low REAL,
vix_close REAL,
vvix_close REAL,
skew_close REAL,
vix3m_close REAL,
vix6m_close REAL,
vix_m1_futures REAL, -- front month VIX future
vix_m2_futures REAL, -- second month VIX future
vix_event_month REAL, -- month containing next FOMC or CPI
contango_m1_m2 REAL, -- (m2/m1 - 1) * 100
vix_regime TEXT
);
-- Realized volatility calculations
CREATE TABLE IF NOT EXISTS realized_vol (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
rv_cc_5d REAL, -- close-to-close, 5 day
rv_cc_10d REAL,
rv_cc_20d REAL,
rv_cc_30d REAL,
rv_cc_60d REAL,
rv_parkinson_20d REAL, -- high-low estimator
rv_yang_zhang_20d REAL, -- OHLC estimator
vrp_20d REAL, -- VIX - rv_cc_20d (variance risk premium)
iv_rv_spread_20d REAL, -- atm_iv - rv_cc_20d
PRIMARY KEY (date, ticker)
);
-- FRED macro series
CREATE TABLE IF NOT EXISTS fred_macro (
date TEXT NOT NULL,
series_id TEXT NOT NULL,
value REAL,
PRIMARY KEY (date, series_id)
);
CREATE INDEX IF NOT EXISTS idx_fred_series ON fred_macro(series_id, date);
-- Derived macro signals (calculated from fred_macro)
CREATE TABLE IF NOT EXISTS macro_signals (
date TEXT PRIMARY KEY,
sofr REAL,
yield_2y REAL,
yield_10y REAL,
yield_curve_slope REAL, -- 10y - 2y
yield_curve_slope_chg_5d REAL,
hy_spread REAL,
ig_spread REAL,
hy_spread_chg_5d REAL,
rrp_balance_bn REAL,
tga_balance_bn REAL,
rrp_tga_combined_bn REAL, -- proxy for system liquidity
fed_balance_sheet_bn REAL,
-- Real-time bond ETF proxies
jnk_price REAL, -- HY ETF
lqd_price REAL, -- IG ETF
hy_spread_intraday REAL, -- approximated from JNK
-- CME FedWatch
cme_fedwatch_hold_prob REAL,
cme_fedwatch_hike_prob REAL,
cme_fedwatch_cut_prob REAL,
-- VSTOXX
vstoxx_8am REAL,
vstoxx_vix_spread REAL,
-- Carry trade
usdjpy REAL,
usdchf REAL,
jpy_chg_1d_pct REAL,
carry_stress_flag INTEGER DEFAULT 0 -- 1 if JPY/CHF moves >0.5% intraday
);
-- CFTC COT options positions
CREATE TABLE IF NOT EXISTS cftc_cot (
report_date TEXT NOT NULL,
market_name TEXT NOT NULL,
noncomm_long INTEGER,
noncomm_short INTEGER,
noncomm_net INTEGER,
comm_long INTEGER,
comm_short INTEGER,
comm_net INTEGER,
nonrept_long INTEGER,
nonrept_short INTEGER,
PRIMARY KEY (report_date, market_name)
);
-- Kalshi market snapshots
CREATE TABLE IF NOT EXISTS kalshi_markets (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
market_ticker TEXT NOT NULL,
event_ticker TEXT,
title TEXT,
category TEXT, -- 'fed', 'cpi', 'equity_threshold', 'macro'
yes_bid REAL,
yes_ask REAL,
no_bid REAL,
no_ask REAL,
mid_price REAL, -- (yes_bid + yes_ask) / 2
volume_24h REAL,
open_interest_usd REAL,
close_time TEXT,
expiration_time TEXT,
days_to_expiry INTEGER,
-- [PANEL: Sonar] Order book depth (top 5 levels)
bid_price_1 REAL, bid_size_1 REAL,
bid_price_2 REAL, bid_size_2 REAL,
bid_price_3 REAL, bid_size_3 REAL,
ask_price_1 REAL, ask_size_1 REAL,
ask_price_2 REAL, ask_size_2 REAL,
ask_price_3 REAL, ask_size_3 REAL,
top3_bid_depth_usd REAL,
top3_ask_depth_usd REAL,
spread_cents REAL,
illiquid_flag INTEGER DEFAULT 0 -- 1 if top3 depth < $500
);
CREATE INDEX IF NOT EXISTS idx_kalshi_ticker_ts ON kalshi_markets(market_ticker, timestamp);
CREATE INDEX IF NOT EXISTS idx_kalshi_expiry ON kalshi_markets(expiration_time);
-- Polymarket snapshots
CREATE TABLE IF NOT EXISTS polymarket_markets (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL,
condition_id TEXT NOT NULL,
question TEXT,
category TEXT,
yes_price REAL,
no_price REAL,
volume_24h_usd REAL,
liquidity_usd REAL,
end_time TEXT,
amm_slippage_500 REAL, -- estimated slippage for $500 position
usdc_usdt_rate REAL, -- [PANEL: Sonar] USDC/USDT peg health
illiquid_flag INTEGER DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_poly_condition_ts ON polymarket_markets(condition_id, timestamp);
-- Arbitrage signals — cross-venue and options bridge
CREATE TABLE IF NOT EXISTS arbitrage_signals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
detected_at TEXT NOT NULL,
signal_type TEXT NOT NULL, -- 'kalshi_options_bridge', 'poly_options_bridge', 'cross_venue', 'fedwatch_kalshi'
underlying TEXT,
event_description TEXT,
-- Prediction market side
venue_1 TEXT,
venue_1_price REAL,
venue_1_side TEXT,
venue_1_liquidity_usd REAL,
-- Options or CME side
venue_2 TEXT,
venue_2_price REAL, -- options-implied probability or CME FedWatch
venue_2_method TEXT, -- 'breeden_litzenberger', 'bjerksund_stensland', 'fedwatch', 'kalshi'
-- Edge
edge_cents REAL,
edge_after_fees_cents REAL,
confidence_pct INTEGER,
-- Context
vix_at_detection REAL,
gex_regime TEXT,
-- Lifecycle
expiration_time TEXT,
status TEXT DEFAULT 'open', -- 'open', 'traded', 'expired', 'invalidated'
outcome TEXT,
brier_score REAL,
notes TEXT
);
CREATE INDEX IF NOT EXISTS idx_arb_signals_status ON arbitrage_signals(status, detected_at);
CREATE INDEX IF NOT EXISTS idx_arb_signals_type ON arbitrage_signals(signal_type, detected_at);
-- USDC peg monitoring (Polymarket liquidity proxy)
CREATE TABLE IF NOT EXISTS usdc_peg_monitor (
timestamp TEXT PRIMARY KEY,
usdc_usd_rate REAL,
usdt_usd_rate REAL,
usdc_usdt_spread_bps REAL,
is_depeg_risk INTEGER DEFAULT 0 -- 1 if spread > 50 bps
);
-- OCC daily open interest
CREATE TABLE IF NOT EXISTS occ_daily_oi (
date TEXT NOT NULL,
exchange TEXT,
underlying TEXT NOT NULL,
expiration_date TEXT NOT NULL,
strike REAL NOT NULL,
option_type TEXT NOT NULL,
volume INTEGER DEFAULT 0,
open_interest INTEGER DEFAULT 0,
PRIMARY KEY (date, underlying, expiration_date, strike, option_type)
);
CREATE INDEX IF NOT EXISTS idx_occ_underlying ON occ_daily_oi(underlying, date);
-- FINRA short volume
CREATE TABLE IF NOT EXISTS finra_short_volume (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
short_volume INTEGER,
short_exempt_volume INTEGER,
total_volume INTEGER,
short_pct REAL, -- short_volume / total_volume
PRIMARY KEY (date, ticker)
);
-- SEC Form 4 insider trades
CREATE TABLE IF NOT EXISTS sec_form4 (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filing_date TEXT NOT NULL,
ticker TEXT NOT NULL,
reporting_person TEXT,
transaction_date TEXT,
shares INTEGER,
price_per_share REAL,
transaction_code TEXT, -- 'P'=purchase, 'S'=sale, 'A'=award, etc.
ownership_type TEXT, -- 'D'=direct, 'I'=indirect
is_buy INTEGER,
is_sell INTEGER
);
CREATE INDEX IF NOT EXISTS idx_form4_ticker ON sec_form4(ticker, filing_date);
-- REG SHO threshold list
CREATE TABLE IF NOT EXISTS regsho_threshold (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
exchange TEXT,
consecutive_days INTEGER,
PRIMARY KEY (date, ticker)
);
-- SEC Form 424B2 structured products (barrier levels)
CREATE TABLE IF NOT EXISTS structured_products_424b2 (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filing_date TEXT NOT NULL,
issuer TEXT NOT NULL,
underlying TEXT NOT NULL,
barrier_pct REAL, -- barrier as % of initial price
barrier_price REAL, -- barrier in absolute price terms
initial_price REAL, -- stock price at issuance
notional_usd_mm REAL,
maturity_date TEXT,
product_type TEXT, -- 'autocallable', 'barrier_reverse_convertible', 'digital'
is_active INTEGER DEFAULT 1,
source TEXT DEFAULT 'manual_mvp', -- 'manual_mvp', 'regex_auto', 'nlp_auto'
accession_number TEXT,
notes TEXT
);
CREATE INDEX IF NOT EXISTS idx_424b2_underlying ON structured_products_424b2(underlying, is_active);
-- FINRA ATS dark pool data (weekly aggregate)
CREATE TABLE IF NOT EXISTS finra_ats_dark_pool (
week_ending TEXT NOT NULL,
ticker TEXT NOT NULL,
ats_share_quantity INTEGER,
total_share_quantity INTEGER,
ats_pct REAL,
PRIMARY KEY (week_ending, ticker)
);
-- ETF flows
CREATE TABLE IF NOT EXISTS etf_flows (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
shares_outstanding INTEGER,
shares_change INTEGER,
flow_direction TEXT, -- 'creation' or 'redemption'
estimated_flow_usd_mm REAL,
PRIMARY KEY (date, ticker)
);
-- Dividend calendar
CREATE TABLE IF NOT EXISTS dividend_calendar (
ticker TEXT NOT NULL,
declaration_date TEXT,
ex_dividend_date TEXT NOT NULL,
record_date TEXT,
payment_date TEXT,
dividend_amount REAL,
frequency TEXT,
PRIMARY KEY (ticker, ex_dividend_date)
);
CREATE INDEX IF NOT EXISTS idx_div_ex_date ON dividend_calendar(ex_dividend_date);
-- Daily state (from Section G.1 above — duplicated here for reference)
-- See Section G.1 for full CREATE TABLE statement
-- Earnings calendar (from Section G.3 above)
-- See Section G.3 for full CREATE TABLE statement
-- Treasury auction calendar
CREATE TABLE IF NOT EXISTS treasury_auctions (
auction_date TEXT NOT NULL,
security_type TEXT NOT NULL, -- 'Bill', 'Note', 'Bond', 'TIPS', 'FRN'
term TEXT, -- '2Y', '5Y', '10Y', '30Y'
cusip TEXT,
bid_to_cover REAL,
indirect_bidders_pct REAL,
high_yield REAL,
when_issued_yield REAL,
tail_bps REAL, -- high_yield - when_issued_yield (positive = weak auction)
is_weak_auction INTEGER DEFAULT 0, -- tail > 1bp or btc < 2.3
PRIMARY KEY (auction_date, security_type, term)
);
-- Implied correlation index
CREATE TABLE IF NOT EXISTS implied_correlation (
date TEXT NOT NULL,
index_name TEXT NOT NULL, -- 'COR1M', 'COR3M'
value REAL,
PRIMARY KEY (date, index_name)
);
-- Data freshness tracking (from K.2)
CREATE TABLE IF NOT EXISTS data_freshness (
source TEXT PRIMARY KEY,
last_successful_update TEXT,
expected_update_cadence_minutes INTEGER,
max_staleness_minutes INTEGER,
is_stale INTEGER DEFAULT 0,
consecutive_failures INTEGER DEFAULT 0
);
-- Data quality log (from K.1)
CREATE TABLE IF NOT EXISTS data_quality_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source TEXT NOT NULL,
table_name TEXT NOT NULL,
record_key TEXT,
issue_type TEXT NOT NULL,
issue_description TEXT,
raw_value TEXT,
flagged_at TEXT DEFAULT (datetime('now')),
resolved INTEGER DEFAULT 0
);
-- Reddit sentiment
CREATE TABLE IF NOT EXISTS reddit_sentiment_daily (
date TEXT NOT NULL,
ticker TEXT NOT NULL,
subreddit TEXT NOT NULL,
mention_count INTEGER,
bullish_count INTEGER,
bearish_count INTEGER,
sentiment_score REAL,
is_3sigma_spike INTEGER DEFAULT 0,
avg_30d_mentions REAL,
std_30d_mentions REAL,
PRIMARY KEY (date, ticker, subreddit)
);
| Source | Endpoint | Auth | Rate Limit | Notes |
|---|---|---|---|---|
| Polygon.io chains | https://api.polygon.io/v3/snapshot/options/{ticker} |
API key (header) | 5 req/min (free), 100/min ($29) | Use paid plan |
| Polygon.io trades | https://api.polygon.io/v2/last/trade/{ticker} |
API key | Same | For intraday |
| OCC daily files | https://www.theocc.com/market-data/market-data-reports/volume-and-open-interest/ |
None | No stated limit | Pull 1x/day |
| CBOE VIX data | https://cdn.cboe.com/products/us/indices/data/ |
None | Rate-limit conservatively | Rotate user agents |
| CBOE VIX futures | https://cdn.cboe.com/products/us/futures/data/VX/ |
None | Rate-limit | Daily settlement CSVs |
| CBOE SKEW | https://www.cboe.com/tradable_products/vix/skew_historical_data/ |
None | Rate-limit | Daily |
| CBOE Implied Corr | https://www.cboe.com/tradable_products/vix/implied_correlation/ |
None | Rate-limit | Daily |
| FRED API | https://api.stlouisfed.org/fred/series/observations?series_id={id}&api_key={key}&file_type=json |
Free API key | 120 req/min | Generous limits |
| CME FedWatch | https://www.cmegroup.com/markets/interest-rates/cme-fedwatch-tool.html |
None (scrape) | Rate-limit | Parse JSON embedded in page |
| TreasuryDirect | https://www.treasurydirect.gov/TA_WS/securities/search?type=Note&startDate={date}&endDate={date} |
None | No stated limit | Official gov API |
| Kalshi | https://api.elections.kalshi.com/trade-api/v2/markets |
RSA-PSS auth | 10 req/sec | Existing code from weather desk |
| Kalshi orderbook | https://api.elections.kalshi.com/trade-api/v2/markets/{ticker}/orderbook |
RSA-PSS auth | 10 req/sec | [PANEL: Sonar] Add this |
| Polymarket REST | https://clob.polymarket.com/markets |
None | Rate-limit | Paginated |
| Polymarket TheGraph | https://api.thegraph.com/subgraphs/name/polymarket/matic-markets |
None | Rate-limit | GraphQL |
| CoinGecko USDC | https://api.coingecko.com/api/v3/simple/price?ids=usd-coin,tether&vs_currencies=usd |
None (50 req/min) | 50/min free | USDC peg monitoring |
| Deribit WebSocket | wss://www.deribit.com/ws/api/v2 |
None for public | No stated limit | Subscribe to instruments |
| SEC EDGAR search | https://efts.sec.gov/LATEST/search-index?forms=4&dateRange=custom&startdt={date} |
None | Rate-limit (10 req/10s) | Official free API |
| SEC EDGAR full-text | https://efts.sec.gov/LATEST/search-index?forms=424B2 |
None | Same | For structured products |
| FINRA short volume | https://www.finra.org/sites/default/files/short-sale-volume-files/ |
None | None stated | Daily CSV files |
| FINRA ATS | https://www.finra.org/rules-guidance/transparency/finra-ats-data/ats-weekly-summary |
None | None stated | Weekly CSV |
| OCC CBOE customer/firm | https://www.cboe.com/market-data/options-volume/ |
None | Rate-limit | Free daily data |
| CFTC COT options | https://www.cftc.gov/dea/options/deaoiall.htm |
None | None stated | Weekly release |
| REG SHO list | https://www.nasdaqtrader.com/trader.aspx?id=regsho |
None | None | Daily |
| Nasdaq earnings | https://api.nasdaq.com/api/calendar/earnings?date={date} |
None | Rate-limit | Scrape |
| Yahoo Finance hist | https://finance.yahoo.com/quote/{ticker}/history/ (via yfinance) |
None | Rate-limit | Historical OHLCV only — do NOT use for options chains |
| ETF.com flows | https://www.etf.com/etfanalytics/etf-fund-flows-tool |
None (scrape) | Rate-limit | Weekly shares outstanding |
| Nasdaq dividends | https://api.nasdaq.com/api/calendar/dividends?date={date} |
None | Rate-limit | Scrape |
| VSTOXX delayed | https://www.stoxx.com/indices-data?isin=EU0009658152 |
None (scrape) | Rate-limit | 30-min delay on free |
| Eurex VIX futures | https://www.eurex.com/ex-en/market-data/statistics/market-statistics-download |
None | Rate-limit | Alternative to VSTOXX scrape |
| Data Source | Frequency | Time | DB Table |
|---|---|---|---|
| Polygon.io options chain (SPY, QQQ) | Every 15 min | 9:30 AM – 4:00 PM ET | options_chain_intraday |
| Polygon.io options chain (earnings tickers) | Every 15 min | 9:30 AM – 4:00 PM ET | options_chain_intraday |
| GEX calculation | Every 15 min (after chain update) | 9:30 AM – 4:00 PM ET | gex_snapshots |
| VIX real-time | Every 15 min | 9:30 AM – 4:15 PM ET | (used directly in daily_state) |
| Kalshi markets + orderbook | Every 15 min | 9:00 AM – 11:55 PM ET | kalshi_markets |
| Polymarket markets | Every 15 min | 24/7 | polymarket_markets |
| CME FedWatch | Every 15 min | 9:00 AM – 5:00 PM ET | macro_signals |
| USDC/USDT peg | Every 15 min | 24/7 | usdc_peg_monitor |
| JPY/CHF spot | Every 15 min | 24/7 | macro_signals |
| Carry stress check | Every 15 min | 24/7 | macro_signals |
| OCC daily files | Daily 8:00 AM ET | T+1 | occ_daily_oi, options_chain_daily |
| FRED macro series | Daily 6:00 PM ET | After FRED publish | fred_macro, macro_signals |
| VIX suite daily | Daily 4:15 PM ET | After close | vix_daily |
| Realized volatility | Daily 4:30 PM ET | After OHLCV available | realized_vol |
| IV history snapshot | Daily 4:15 PM ET | After close | iv_history |
| Daily state table | Daily 5:00 PM ET | Compile from day's data | daily_state |
| FINRA short volume | Daily 6:00 PM ET | T+1 data | finra_short_volume |
| SEC Form 4 | Daily 6:00 PM ET | Filed throughout day | sec_form4 |
| SEC Form 424B2 | Daily 6:00 PM ET | Filed throughout day | structured_products_424b2 |
| REG SHO threshold | Daily 6:00 PM ET | After market close | regsho_threshold |
| Earnings calendar | Weekly (Friday) | Update for next 14 days | earnings_calendar |
| CFTC COT | Weekly (Friday 3:30 PM ET) | Data as of Tuesday | cftc_cot |
| FINRA ATS dark pool | Weekly (Friday) | Week prior data | finra_ats_dark_pool |
| VSTOXX | Daily 8:00 AM ET | Pre-market | macro_signals |
| ETF flows (shares outstanding) | Daily 5:00 PM ET | From ETF issuer | etf_flows |
| Dividend calendar | Weekly (Monday) | For next 14 days | dividend_calendar |
| Treasury auctions | Daily 1:00 PM ET | Results published same day | treasury_auctions |
| Implied correlation | Daily 4:30 PM ET | CBOE published EOD | implied_correlation |
| Deribit BTC/ETH options | Hourly snapshot + real-time WebSocket | 24/7 | (separate deribit tables) |
| Reddit sentiment | Daily 4:30 PM ET | After market close | reddit_sentiment_daily |
Corrected estimates (Opus flaw fix):
SPY alone has ~5,000 active contracts across all strikes and expirations.
Realistic estimates:
| Data Type | Daily Volume | Annual (raw) |
|---|---|---|
| SPY + QQQ intraday chains | ~50 MB/day | ~18 GB |
| Earnings tickers (average) | ~5 MB/day | ~1.8 GB |
| GEX snapshots | ~1 MB/day | ~365 MB |
| Kalshi/Polymarket | ~5 MB/day | ~1.8 GB |
| FRED/macro (tiny) | <0.1 MB/day | ~35 MB |
| OCC daily | ~2 MB/day | ~730 MB |
| Flows (Form 4, FINRA, etc.) | ~0.5 MB/day | ~180 MB |
| Total (raw) | ~65 MB/day | ~23 GB/year |
After archival policy:
VPS headroom: 15 GB free on EdgeClaw VPS. Implement archival after 30 days to stay within budget. Consider separate storage for cold chain data.
SELECT
g.ticker,
g.timestamp,
g.net_gex,
g.gex_regime,
g.gamma_flip_level,
g.gamma_wall_upper,
g.gamma_wall_lower,
g.gex_0dte_pct,
ds.spy_close,
ds.vix_close
FROM gex_snapshots g
JOIN daily_state ds ON date(g.timestamp) = ds.date
WHERE g.ticker = 'SPY'
AND g.timestamp >= datetime('now', '-30 minutes')
ORDER BY g.timestamp DESC
LIMIT 1;
SELECT
signal_type,
underlying,
event_description,
venue_1, venue_1_price,
venue_2, venue_2_price, venue_2_method,
edge_cents,
edge_after_fees_cents,
confidence_pct,
venue_1_liquidity_usd,
expiration_time,
detected_at
FROM arbitrage_signals
WHERE status = 'open'
AND edge_after_fees_cents >= 3
AND venue_1_liquidity_usd >= 300
AND expiration_time > datetime('now')
ORDER BY edge_after_fees_cents DESC;
SELECT
iv.ticker,
iv.date,
iv.atm_iv,
iv.iv_rank_52w,
iv.iv_pct_252d,
iv.skew_25d,
rv.rv_cc_20d,
rv.vrp_20d,
rv.iv_rv_spread_20d,
CASE
WHEN rv.vrp_20d > 8 THEN 'EXPENSIVE_VOL_SELL'
WHEN rv.vrp_20d < -3 THEN 'CHEAP_VOL_BUY'
ELSE 'NEUTRAL'
END AS vol_regime_signal
FROM iv_history iv
JOIN realized_vol rv ON iv.ticker = rv.ticker AND iv.date = rv.date
WHERE iv.date = date('now', '-1 day')
ORDER BY ABS(rv.vrp_20d) DESC;
SELECT
ec.ticker,
ec.earnings_date,
ec.time_of_day,
ec.options_implied_move_pct,
ec.historical_avg_move_pct,
(ec.options_implied_move_pct - ec.historical_avg_move_pct) AS vol_premium_pct,
iv.atm_iv,
iv.iv_rank_52w,
CASE
WHEN (ec.options_implied_move_pct - ec.historical_avg_move_pct) > 3.0
AND iv.iv_rank_52w > 0.70 THEN 'IV_CRUSH_CANDIDATE'
ELSE 'NORMAL'
END AS setup_type
FROM earnings_calendar ec
JOIN iv_history iv ON ec.ticker = iv.ticker AND iv.date = date('now', '-1 day')
WHERE ec.earnings_date BETWEEN date('now') AND date('now', '+14 days')
AND ec.actual_eps IS NULL -- not yet reported
ORDER BY ec.earnings_date, vol_premium_pct DESC;
SELECT
ticker,
timestamp,
strike,
expiration_date,
option_type,
print_size_contracts,
volume_to_oi,
aggressor,
sentiment,
-- Join to get underlying context
(SELECT spy_close FROM daily_state WHERE date = date(timestamp)) AS spy_context,
(SELECT vix_close FROM daily_state WHERE date = date(timestamp)) AS vix_context
FROM options_sweeps
WHERE timestamp >= datetime('now', '-1 day')
AND volume_to_oi > 10
AND print_size_contracts > 500
ORDER BY print_size_contracts DESC
LIMIT 50;
SELECT
a.signal_type,
a.underlying,
a.venue_1 AS prediction_market,
a.venue_1_price AS pm_price,
a.venue_2_price AS options_implied_prob,
ROUND((a.venue_2_price - a.venue_1_price) * 100, 1) AS edge_cents_raw,
a.edge_after_fees_cents,
a.venue_1_liquidity_usd,
k.spread_cents AS kalshi_spread,
ds.vix_regime,
ds.gex_regime,
ds.convergence_pattern
FROM arbitrage_signals a
LEFT JOIN kalshi_markets k ON a.venue_1 = 'Kalshi'
AND k.market_ticker LIKE '%' || a.underlying || '%'
AND k.timestamp > datetime('now', '-20 minutes')
JOIN daily_state ds ON ds.date = date('now')
WHERE a.status = 'open'
AND a.edge_after_fees_cents >= 4
AND a.detected_at > datetime('now', '-4 hours')
ORDER BY a.edge_after_fees_cents DESC;
SELECT
ms.date,
ms.cme_fedwatch_hold_prob,
k.mid_price AS kalshi_hold_price,
ROUND((ms.cme_fedwatch_hold_prob - k.mid_price) * 100, 1) AS divergence_cents,
k.top3_bid_depth_usd,
k.illiquid_flag
FROM macro_signals ms
JOIN kalshi_markets k ON date(k.timestamp) = ms.date
AND k.market_ticker LIKE '%FED%HOLD%'
AND k.timestamp = (
SELECT MAX(timestamp) FROM kalshi_markets k2
WHERE k2.market_ticker = k.market_ticker
AND date(k2.timestamp) = ms.date
)
WHERE ms.date = date('now')
AND ABS(ms.cme_fedwatch_hold_prob - k.mid_price) > 0.05 -- >5 cent divergence
AND k.illiquid_flag = 0;
SELECT
g1.ticker,
g1.timestamp AS current_time,
g1.net_gex AS current_gex,
g0.net_gex AS morning_gex,
g1.net_gex - g0.net_gex AS gex_change,
g1.gamma_flip_level,
g1.gex_0dte_pct,
CASE
WHEN g0.gex_regime != g1.gex_regime THEN 'REGIME_CHANGE'
WHEN ABS(g1.net_gex - g0.net_gex) > ABS(g0.net_gex) * 0.30 THEN 'LARGE_MOVE'
ELSE 'STABLE'
END AS gex_change_type
FROM gex_snapshots g1
JOIN gex_snapshots g0 ON g0.ticker = g1.ticker
AND g0.timestamp = (
SELECT MIN(timestamp) FROM gex_snapshots
WHERE ticker = g1.ticker
AND date(timestamp) = date(g1.timestamp)
)
WHERE g1.ticker = 'SPY'
AND g1.timestamp >= datetime('now', '-30 minutes')
ORDER BY g1.timestamp DESC
LIMIT 1;
This section documents all identified flaws from the full panel review and their resolutions. Each is tracked for implementation.
| # | Flaw | Severity | Fix Status | Implementation Note |
|---|---|---|---|---|
| 1 | GEX dealer directionality oversimplified | CRITICAL | Addressed in B.2 | Use CBOE customer/firm volume; add confidence band |
| 2 | Yahoo Finance scraping for intraday chains | CRITICAL | Addressed in A.1 | Polygon.io from Day 1, $29/month |
| 3 | SQLite write contention on 15-min polling | HIGH | Addressed in L (WAL mode) | WAL mode + separate DBs + staggered schedules |
| 4 | Binary probability uses naive d2 (wrong for American options) | HIGH | Addressed in F.3 | Use Bjerksund-Stensland; validate against Kalshi prices |
| 5 | VSTOXX lead time overstated (30-90 min claimed, 0-15 min actual) | MEDIUM | Addressed in C.6 | Track VSTOXX/VIX spread instead; VIX futures as alternative |
| 6 | Storage estimate 4-5x too low | MEDIUM | Addressed in O | Updated to ~23 GB/year raw, ~4-6 GB with archival |
| 7 | OCC data is T+1 — spec conflated timing | MEDIUM | Addressed in E.1 | Use OCC as base; estimate intraday changes from volume |
| 8 | Polymarket USDC depeg — Deribit not a leading indicator | MEDIUM | Addressed in F.2 | Use USDC/USDT on DEXes (CoinGecko) as actual indicator |
| 9 | Buyback blackout % hard to calculate | MEDIUM | Addressed in G.1 | 14-day pre-earnings heuristic; label as ±10% estimate |
| 10 | FRED credit spread has 24h lag | MEDIUM | Addressed in D.1 | Bond ETF real-time proxies (JNK, LQD) for intraday |
| 11 | Phantom liquidation detection broken during big moves | MEDIUM | Addressed in E.4 | Track calls and puts separately; only flag when both drop |
| 12 | Deribit validation creates false confidence | MEDIUM | Addressed in B.4 | Validate against multiple sources, not Deribit alone |
| 13 | No explicit data quality layer | CRITICAL (GPT-4.1) | Addressed in K.1 | data_quality_log table; validation rules per source |
| 14 | No backup/archival plan | HIGH (GPT-4.1) | Addressed in K.4 | Daily SQLite dump; weekly off-server sync |
| 15 | Form 424B2 deferred to Month 2 | CRITICAL (Sonar) | Addressed in E.7 | MVP in Week 1: manual tracking of top 5 products |
| 16 | Dividend data missing from Greeks calculation | MUST-HAVE (Opus/Gemini) | Addressed in B.1, J.2 | Add q to all BS calculations; dividend_calendar table |
| 17 | Corporate actions not tracked | MUST-HAVE (Gemini) | Addressed in B.1 | Pull from Polygon.io Reference Data API |
| 18 | Realized volatility not defined | MUST-HAVE (Opus) | Addressed in C.2 | Three estimators at 5 windows each |
| 19 | IV rank database not built | MUST-HAVE (Opus) | Addressed in C.3 | iv_history table; collect daily from Day 1 |
| 20 | CME FedWatch not included | MUST-HAVE (Opus) | Addressed in D.2 | Every 15 min during market hours |
| 21 | FINRA short volume not included | MUST-HAVE (Opus) | Addressed in E.5 | Daily CSV pull at 6 PM ET |
| 22 | Bid-ask spread not tracked | MUST-HAVE (Opus) | Addressed in A.3 | spread_pct = (ask-bid)/mid stored on every contract |
| 23 | Kalshi orderbook depth missing | MUST-HAVE (Sonar) | Addressed in F.1 | Top 5 levels; illiquid_flag when depth < $500 |
| 24 | Vega flip levels not calculated | MUST-HAVE (Sonar) | Addressed in B.3 | Added to second_order_greeks table |
| 25 | Implied borrow rate not tracked | MUST-HAVE (Sonar) | Not yet in schema | Future: track put/call parity deviations to back-calculate implied borrow |
| Item | Action | Reason |
|---|---|---|
| SGX Nifty as pre-market signal | Remove | Near-zero predictive value for US options; S&P futures cover the same information |
| CME CVOL for commodities/FX | Defer (v2) | Kalshi commodity markets have negligible volume; revisit in 6 months |
| Reddit r/wallstreetbets YOLO frequency | Simplify | SNR degraded since 2021; replace with r/options only + 3-sigma spike alerts |
| OCC Exercise and Assignment Stats (monthly) | Defer | Monthly granularity too coarse; only daily exercise data would be actionable, and OCC doesn't publish that for free |
| Full IV surface at 7 delta points × 9 expirations | Simplify | Use ATM IV, 25-delta skew, and IV at prediction market strikes. Covers 95% of use cases with 30% of complexity |
| All 8-9 VIX futures months | Simplify | M1, M2, and event month only |
| X/Twitter flow account aggregation 3x daily | Simplify | 1x daily for consensus themes only; actual flows captured better by volume/OI analysis |
| Full sector ETF chain tracking (XLF/XLE/XLK intraday) | Simplify | SPY already captures sector rotation; pull OI and P/C only for sector ETFs, not full intraday chains |
| MOVE Index tracking | Replace | Use T10Y2Y from FRED instead (already collected); stronger signal for options |
| CME CVOL for Kalshi commodity markets | Defer | Thin markets, low priority |
The following components built for the Options Desk should be shared across all desks. Build as shared utilities, not desk-specific code.
| Component | Options Desk Use | Other Desks |
|---|---|---|
| FRED macro pipeline | Rates, credit spreads, liquidity | Forex (yield differentials), Futures (Treasury data), Crypto (liquidity proxy) |
| Daily state table / calendar engine | FOMC, CPI, OpEx, earnings flags | ALL desks — this is the shared event calendar |
| Kalshi/Polymarket scraper framework | Options-linked markets | Weather (existing), Sports, Crypto — all prediction market desks |
| VIX suite and vol regime classification | Core options signal | Crypto (risk appetite), Forex (VIX/JPY correlation), Futures (roll costs) |
| Deribit crypto options feed | Greeks validation + Poly liquidity | Crypto desk (direct trading signal) |
| SEC EDGAR scraping infrastructure | Form 4, 424B2 | Stocks desk (insider signals), any desk needing corporate filing data |
| Binary probability calculator (Breeden-Litzenberger / Bjerksund-Stensland) | Options-to-prediction-market bridge | Any desk comparing model probabilities to prediction market prices (same math regardless of underlying event) |
| Credit spread monitoring (HY/IG from FRED + bond ETFs) | Risk-off signal for options | Forex (USD funding markets), Crypto (credit tightening leads crypto selloffs 1-5 days) |
| Carry trade indicator (JPY/CHF) | Risk-off leading signal | Forex desk (directly), Crypto desk |
| Data quality layer + circuit breaker | All options data sources | All desks — standardize the validation and alerting pattern |
Day 1-2: Prediction Markets (Execution Venues)
prediction_markets.db with WAL modeDay 2-3: Macro Foundation
volatility.db with WAL modeDay 3-4: Vol Foundation
Day 4-5: Calendar and State
calendar.db with WAL modeDay 5: Form 424B2 MVP (DO NOT DEFER)
structured_products_424b2 tableDay 6-7: Polygon.io Integration
options.db with WAL modeDay 7-8: Greeks Engine
Day 8-9: IV Surface and RV
Day 9-10: Binary Probability Bridge
Day 11: OCC Daily Files
Day 12: FINRA and REG SHO
Day 13: SEC EDGAR
Day 14: Sweep Detection and Flow Analytics
Day 15: CFTC COT
| Step | Component | Effort | Dependencies |
|---|---|---|---|
| 1 | Daily state table + calendar engine | 2-3 days | None |
| 2 | FRED API integration | 1-2 days | None |
| 3 | VIX suite from CBOE | 2-3 days | None |
| 4 | Kalshi scraper (adapt from weather desk) + orderbook depth | 1-2 days | Existing Kalshi code |
| 5 | Polymarket scraper + USDC monitoring | 2-3 days | None |
| 6 | Polygon.io subscription + chain data pipeline | 3-5 days | $29/month budget |
| 7 | Realized volatility calculators (close-to-close, Parkinson, Yang-Zhang) + VRP | 1-2 days | OHLCV data (Step 6) |
| 8 | Dividend calendar + corporate actions scraper | 1 day | None |
| 9 | GEX calculator (with CBOE customer/firm volume weighting) | 3-5 days | Chain data (Step 6) |
| 10 | Binary probability calculator (Bjerksund-Stensland + Breeden-Litzenberger) | 2-3 days | Chain + IV surface |
| 11 | Form 424B2 MVP (manual: 5 products × 3 issuers) | 4 hours | None — do first week |
| 12 | Earnings calendar + historical earnings database | 2-3 days | None |
| 13 | OCC daily files scraper + put/call ratios | 1-2 days | None |
| 14 | Phantom liquidation + unusual volume scanner + flow analytics | 2-3 days | Chain data (Step 6) |
| 15 | SEC EDGAR (Form 4 insider trades, REG SHO) | 2-3 days | None |
| 16 | CFTC COT options-only report parser | 1 day | None |
| 17 | FINRA short volume daily files | 1 day | None |
| 18 | CME FedWatch implied probabilities | 1-2 days | None |
| 19 | Convergence pattern detector | 1 day | Daily state (Step 1) |
| 20 | Reddit sentiment (simplified, r/options only) | 1-2 days | Low priority |
| 21 | Vanna/Charm calculator | 1-2 days | Chain data (Step 6) |
| 22 | Deribit crypto options feed | 1 day | Step 6 for validation context |
| 23 | Form 424B2 EDGAR full automation + regex | 2-4 weeks | Step 11 first |
| 24 | VSTOXX + international pre-market + JPY/CHF carry composite | 2-3 days | When core is stable |
Realistic to build: Yes.
Biggest risk: Data pipeline reliability. You are pulling from 15+ sources on different schedules with different failure modes. CBOE might change their CSV format. OCC might delay publication. Polygon might have an outage. EDGAR might throttle you. Each individual source is manageable, but the combinatorial failure space is large. On any given day, at least one source will be broken. The system needs graceful degradation — every component must handle missing upstream data without crashing. Build circuit breakers and staleness alerts from Day 1. If SPY chain data is >30 minutes stale, the GEX calculation must flag itself as unreliable rather than silently using stale data.
First to break: Yahoo Finance scraping (if you do not use Polygon from Day 1). The second thing to break will be the 15-minute polling schedule — when multiple scrapers fire simultaneously, SQLite write contention will cause silent data loss (writes that timeout and get dropped). You will not notice until you see gaps in your time series days later.
Top change: Pay for Polygon.io ($29/month) from Week 1. Reliable options chain data is the FOUNDATION of this entire desk — GEX, vanna, charm, IV surface, binary probability calculator, flow detection, and earnings analysis all depend on it. Building on Yahoo Finance scraping is building on sand. $29/month is $0.97/day. One missed trading signal due to stale/missing chain data will cost more than a year of Polygon subscription. Additionally, Polygon's historical data lets you backfill the IV rank database immediately rather than waiting 252 days.
[PANEL: Sonar Reasoning Pro] What would make this A+:
Merged final complete. Opus 4.6 is the unmodified base. All panel additions are tagged with [PANEL: Model Name]. All flaw fixes are resolved inline. Sections K-V are implementation additions not in the original Opus review but required by the spec rules.
Status: NOT BUILT — add when this desk goes live for execution
Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.
Why WebSocket matters:
Note: This desk primarily trades on other exchanges, but Kalshi has event markets (rate decisions, index levels, crypto milestones) that overlap with this desk. WebSocket upgrade applies when trading any Kalshi markets.
Added 2026-03-29 — upgrade REST to WS when desk moves to live execution