This is the complete data collection specification for the Options desk in the research pipeline. It covers equity options traded on Robinhood AND event contracts traded on Kalshi/Polymarket. An AI builder should be able to read this and know exactly what data to collect, from where, how often, and why.
Unlike sports desks that only trade prediction markets, the Options desk trades on THREE platforms:
The edge: We collect data that lets us find mispricing in ALL three venues, plus cross-market arbitrage when the same event is priced differently across them. Example: if SPY options imply a 35% chance of a Fed cut but Kalshi prices it at 42 cents, one side is wrong.
An AI builder must understand these structural differences:
Continuous vs Binary Payoffs — Sports bets pay $1 or $0. Options pay varying amounts depending on how far the stock moves past the strike. This means Greeks (delta, gamma, vanna, charm) matter — they tell you how the option price changes as things move.
Time Decay Is Constant — Options lose value every day just from time passing (theta). Sports bets don't decay until the game happens. This means timing of entry matters more.
Volatility Is Tradeable — You can bet on how MUCH a stock will move without caring about direction. The VIX measures this for the whole market. When VIX is cheap relative to realized moves, buy volatility. When expensive, sell it.
Market Makers Control Flow — Big banks and market makers hold massive options positions. Their hedging activity (gamma exposure) creates predictable price movements at certain levels. If we can estimate their positioning, we can predict where the stock will be pushed.
Cross-Market Bridge — To compare Robinhood options to Kalshi/Polymarket, we need a "binary translation layer" — math that converts an options chain's implied volatility into the exact probability of a specific event (e.g., "SPY above $500 by Friday"). This is d2 in the Black-Scholes formula.
Crypto Liquidity Risk (Polymarket) — Polymarket runs on USDC/Polygon. If crypto volatility spikes or USDC depegs, liquidity gets pulled from Polymarket instantly. Deribit crypto options data is a leading indicator for this.
What: Complete options chain for tracked tickers — every strike, every expiration, calls and puts.
Tickers to track:
What to pull per contract:
Collection frequency:
Primary source: OCC (theocc.com) — free daily CSVs for OI and volume. This is the definitive source (it's the actual clearinghouse). Real-time source: Yahoo Finance API (free, 15-min delay) for intraday snapshots. Rotate user agents, add delays, respect rate limits. Upgrade path: Polygon.io ($29/month) when budget allows — real-time, reliable, no scraping risk.
What: Build the implied volatility surface — IV at specific deltas across multiple expirations. This shows how the market prices risk at different levels.
Deltas to interpolate: 5, 10, 15, 25, 30, 40, 50 (for both calls and puts) Expirations to track: Next 4 weekly, next 3 monthly, next 2 quarterly
Calculate from chain data:
Frequency: Daily at close, plus intraday for SPY/QQQ (every 15 min)
What: Isolate the implied move for a SPECIFIC event (earnings, FOMC, CPI) from the surrounding term structure.
How: Compare IV of expirations bracketing the event. The difference = the market's expected event move.
Use case: Compare this "event vol premium" to Kalshi's yes/no pricing for the same event. If options say CPI will move SPY 1.2% but Kalshi prices the "SPY down 1%+ on CPI day" contract at 15 cents, there may be mispricing.
Frequency: Daily for any expiration bracketing a known event. Start 10 trading days before event.
What: Track RSI (14-day) divergence on the underlying asset (SPY, QQQ, single stocks). Options prices are derived from the underlying — detecting underlying reversals/continuations before they happen gives the options desk an edge on timing entries and exits.
4 divergence types (identical logic to forex spec):
What to store per divergence event:
ticker: underlying symboltimeframe: daily (primary), weekly (confirmation)divergence_type: "regular_bullish", "regular_bearish", "hidden_bullish", "hidden_bearish"timestamp: detection candleprice_swing_1, price_swing_2: the two price swing points (level + timestamp)rsi_swing_1, rsi_swing_2: RSI values at each swingstrength: slope difference between price and RSI swing lines (bigger = stronger)iv_at_detection: current ATM IV when divergence forms (tracks whether divergence is already priced into options)skew_at_detection: 25-delta put skew at detection (regular bearish + steep skew = market already nervous, less edge)gex_regime: long gamma or short gamma when divergence forms (short gamma + regular divergence = amplified reversal move)earnings_within_5d: boolean — divergence near earnings is unreliable (event vol dominates)Outcome tracking:
outcome_5d, outcome_10d, outcome_20d: underlying price change after detectioniv_change_5d: how much did ATM IV move after divergence? (tracks whether divergence predicts vol expansion)reversal_occurred: for regular divergence — did price reverse? (boolean + magnitude)continuation_occurred: for hidden divergence — did trend continue? (boolean + magnitude)Storage: SQLite table rsi_divergences (shared schema with forex/stocks).
Collection frequency: Calculated daily at close. Intraday for SPY/QQQ (every 15 min, same as chain snapshots).
Source: Same underlying price data used for chain snapshots (Yahoo Finance / Polygon.io).
Why it matters for options specifically: Regular bearish divergence on SPY + short gamma regime = potential crash setup. The divergence signals the reversal, short gamma amplifies it. This combination is one of the highest-edge signals for put buying.
What: Estimate how much hedging activity market makers will do at each price level. When dealers are "long gamma," they buy dips and sell rips (stabilizing). When "short gamma," they chase moves (amplifying).
How to calculate:
Key outputs:
Frequency: Daily at close for all tracked tickers. Every 15 min for SPY/QQQ.
Validation: Test the math first using Deribit API (free, real-time crypto options with pre-calculated Greeks). If your DIY GEX matches Deribit's published data, the math is correct. Then apply to equity chains.
What: Second-order Greeks that predict predictable, mechanical hedging flows.
How to calculate: Closed-form Black-Scholes partial derivatives. Standard formulas, no paid service needed.
Key outputs:
Frequency: Daily at close. Extra calculation after any VIX move > 2 points intraday.
What: When a stock is hard to borrow (high short interest), put/call parity breaks. Put prices get artificially inflated. This will make your GEX calculation wrong if you don't account for it.
Source: OCC (theocc.com) — publishes daily short borrow rates. What to pull: Borrow rate for every tracked ticker. Flag: Any ticker with borrow rate > 5% needs a GEX adjustment. Frequency: Daily.
What: The family of volatility indices that measure market fear and positioning.
| Metric | What It Measures | Source | Frequency |
|---|---|---|---|
| VIX | 30-day expected S&P 500 volatility | CBOE (free) | Every 15 min |
| VVIX | Volatility of VIX itself (vol-of-vol) | CBOE (free) | Daily |
| SKEW | Tail risk pricing (how much more puts cost vs calls) | CBOE (free) | Daily |
| VIX futures (all months) | Term structure of expected future volatility | CBOE (free) | Every 15 min |
| VIX9D | 9-day VIX (short-term fear) | CBOE (free) | Daily |
| VIX3M | 3-month VIX | CBOE (free) | Daily |
| VIX6M | 6-month VIX | CBOE (free) | Daily |
Calculated metrics:
Frequency: VIX and futures every 15 min. Everything else daily at close.
What: The VIX equivalent for Treasury bonds. Leads equity volatility by hours/days.
Source: FRED API (free) — series ID: MOVE What to track:
Frequency: Daily.
What: European volatility index. Trades from 2:00 AM ET. Leads US options by 30-90 minutes because European markets open first and react to overnight news before US markets can.
Source: Eurex (free delayed data) What to track:
Frequency: Single snapshot at 8:00 AM ET (pre-US-market). That's all we need — it's a leading signal.
What: CME's volatility indices for non-equity assets. More relevant to Kalshi macro markets than equity VIX.
Source: CME Group website (free) What to track: CVOL for Treasuries, EUR/USD, Gold, Crude Oil Frequency: Daily.
What: When banks (Goldman, JPMorgan, etc.) issue structured products (autocallables, barrier notes), they file Form 424B2 with the SEC. These filings reveal EXACTLY what strikes and barriers the bank needs to hedge — which means you know their gamma positioning.
Why it matters: This is the single data source that every panel member flagged. Banks sell billions in structured products. Their hedging creates massive, predictable flows at specific price levels. Nobody scrapes this systematically.
Source: SEC EDGAR (free) — full text search for "424B2" + issuer name What to extract:
Complexity note: These filings are dense legal/financial text. Requires NLP parsing. This is a Month 2-3 build item, not Week 1. Start with the top 5 issuers (Goldman, JPMorgan, Morgan Stanley, Citi, BofA) and SPY/QQQ underlyings only.
Frequency: Daily scan of new filings.
What: Corporate insiders (CEO, CFO, directors) must report stock purchases/sales within 2 business days. Insider buying before earnings + unusual call activity = strong signal.
Source: SEC EDGAR (free) What to extract:
Key signal: Insider buying (not exercise-and-sell, actual open-market purchases) in tracked tickers. Cross-reference with unusual call volume.
Frequency: Daily scan, 6:30 PM ET (filings drop after market close).
What: The CFTC publishes weekly positioning data showing how different trader types (dealers, leveraged funds, asset managers) are positioned. The OPTIONS-ONLY report is separate from the futures report and shows options positioning directly.
Source: CFTC.gov (free CSV downloads) What to track:
Lag warning: Released Friday at 3:30 PM ET, but data is from Tuesday. 3 days stale. Useful for positioning trends, NOT intraday signals.
Frequency: Weekly (Friday evening).
What: SEC publishes daily list of stocks with excessive failed-to-deliver (FTD) shares. These stocks have forced buy-in risk — if too many shares fail to deliver, brokers MUST buy them back. This creates short squeeze setups.
Source: SEC.gov + individual exchange websites (free) What to track:
Frequency: Daily.
What: The Options Clearing Corporation publishes monthly exercise vs expiration data. Unusual exercise patterns (exercising calls early, or unusually high exercise rates) signal informed activity.
Source: OCC (theocc.com) — free monthly reports Frequency: Monthly.
What: Federal Reserve Economic Data — free API for macro data that drives options pricing.
Source: FRED API (free, API key required — free registration) Series to pull:
Key signals:
Frequency: Daily for daily series, weekly for weekly series.
What: Large orders (blocks) and aggressive orders that sweep multiple exchanges (sweeps) indicate institutional/informed activity.
DIY detection from chain data:
Enhanced source (if Polygon.io subscription): Individual trade data with exchange, size, and conditions. Enables true sweep detection (same order hitting multiple exchanges within seconds).
Free alternative: Unusual Whales free tier (limited data, delayed). Reddit r/options and r/wallstreetbets for crowd-sourced flow alerts.
Frequency: Daily at close (DIY). Real-time if Polygon subscription.
What: Ratio of put volume to call volume. Extreme readings = contrarian signal.
Calculate from chain data:
Frequency: Daily at close.
What: When Open Interest drops significantly but daily volume doesn't spike proportionally, someone quietly closed a large position (likely rolled or let expire). This reveals stealth positioning changes that don't show up in flow scanners.
Calculate from chain data:
Frequency: Daily at close.
What: One row per trading day with every relevant boolean flag and event marker. The analysts can query "what kind of day is today?" and get every factor at once.
Schema:
daily_state:
date DATE PRIMARY KEY
-- OpEx flags
is_monthly_opex BOOLEAN (3rd Friday)
is_weekly_opex BOOLEAN (every Friday)
is_quarterly_opex BOOLEAN (quad witching: 3rd Friday of Mar/Jun/Sep/Dec)
days_to_next_opex INTEGER
-- Fed flags
is_fomc_day BOOLEAN
is_fomc_eve BOOLEAN (day before FOMC)
days_to_next_fomc INTEGER
-- Economic data
is_cpi_day BOOLEAN
is_nfp_day BOOLEAN (Non-Farm Payroll, first Friday of month)
is_ppi_day BOOLEAN
is_pce_day BOOLEAN (Fed's preferred inflation gauge)
econ_release_tier INTEGER (1=CPI/NFP/FOMC, 2=PPI/PCE/GDP, 3=everything else)
-- Treasury
is_treasury_auction BOOLEAN
auction_tenor TEXT (e.g., "10Y", "30Y")
-- VIX settlement
is_vix_settlement BOOLEAN (Wednesday AM, monthly)
-- Earnings
num_sp500_reporting INTEGER (how many S&P 500 companies report today)
is_peak_earnings_week BOOLEAN (>50 S&P 500 companies reporting this week)
-- Buyback
pct_sp500_in_blackout REAL (0-100, estimated % of S&P 500 market cap in buyback blackout)
-- 0DTE windows
is_0dte_day BOOLEAN (SPY/QQQ 0DTE available)
-- Month/quarter boundaries
is_month_end BOOLEAN (last trading day of month — rebalancing flows)
is_quarter_end BOOLEAN
-- General
is_half_day BOOLEAN (early close days)
days_to_next_holiday INTEGER
Frequency: Generated daily at 6:00 AM ET (before market open). Updated if new events announced.
What: Upcoming earnings dates for all tracked tickers, plus historical earnings performance data.
Upcoming earnings table:
Historical earnings table (per ticker):
Source: Free earnings calendars (earnings whispers, Yahoo Finance) Frequency: Daily update of upcoming dates. Historical data populated once, updated after each earnings.
What: Rare multi-factor convergence events that create outsized opportunities. Track when multiple calendar flags align.
Patterns to detect:
Frequency: Calculated from daily state table. Flag when any convergence detected.
What: Pull all options-relevant Kalshi markets. These are binary contracts on events we can also price using traditional options.
Markets to track:
What to pull per market:
Frequency: Every 15 minutes during market hours. Snapshot at 8 AM ET (pre-market).
What: Same as Kalshi but on Polymarket. Different liquidity pool (crypto-native traders), so prices often diverge.
Additional Polymarket-specific data:
Frequency: Every 15 minutes during market hours.
What: The math that converts traditional options chain data into Kalshi/Polymarket-comparable probabilities.
How: Use d2 from Black-Scholes to calculate the exact probability that the underlying will be above/below a specific strike by a specific date. Compare this probability to Kalshi/Polymarket contract prices.
Example: Options chain implies 30% probability SPY > $520 by Friday. Kalshi "SPY above $520 Friday" contract trades at 22 cents. The contract is underpriced by 8 cents → buy on Kalshi.
Inputs: IV surface (Section A), risk-free rate (SOFR from FRED), time to expiration, current price. Output: Probability per strike per expiration, compared to prediction market prices. Frequency: Recalculated every time chain data or prediction market prices update.
What: Free, real-time crypto options data. Two purposes: (1) validate our DIY Greeks math, (2) leading indicator for Polymarket liquidity.
Source: Deribit API (docs.deribit.com) — 100% free, real-time What to track:
Frequency: Every 30 minutes. Alert on IV spikes > 10% intraday.
What: Investment grade and high yield corporate bond spreads. These lead equity volatility by 1-3 days because bond markets are smarter/faster than equity options markets.
Source: FRED API (free) Series:
Frequency: Daily.
What: When carry trades unwind (JPY and CHF strengthen), risk assets sell off and options volatility spikes. This is a 1-4 hour leading indicator.
Source: Free FX data feeds (Forex desk cross-reference) What to track:
Frequency: Hourly during market hours. Alert on > 1% JPY/CHF move in 4 hours.
What: 10Y and 30Y Treasury auction results move the SOFR curve, which reprices Kalshi rate markets and options on rate-sensitive stocks.
Source: Treasury Direct (free) What to track:
Frequency: On auction days only (schedule known in advance, in daily state table).
What: Overnight moves in non-US markets that predict US open direction.
Sources: Free delayed data from exchanges
Frequency: Single snapshot at 8:00 AM ET.
What: r/options and r/wallstreetbets posts and comments. Useful as contrarian indicator at extremes (euphoria = top, despair = bottom).
Source: Reddit API / PRAW (free) What to track:
Frequency: Daily aggregation at 10 PM ET.
What: Accounts that post real-time options flow alerts on X. Free proxy for paid flow services.
Source: Grok's built-in X search (free) What to track:
Frequency: 3x daily scans (pre-market, midday, after close).