Stocks Desk Final Ruling — Complete Data Pipeline
Including Penny Stocks, Prediction Markets, and Advanced Metrics
Panel: Opus, Sonnet, Grok 3, Gemini 2.5 Pro (full 4/4)
Judge: Opus (via synthesis)
Date: 2026-03-25
Grade: A (strong consensus on all core components, excellent penny stock coverage)
The Vision
A complete stock intelligence engine covering ALL US equities — from mega-caps to penny stocks. Not just Kalshi binary bets. We want to:
- Buy/sell actual stocks on Robinhood
- Catch penny stock runners before they explode
- Exploit prediction market mispricing on stock events
- Front-run institutional moves using insider + 13F data
- Short squeeze hunting
1. Stock Price Data — Store EVERYTHING (ALL 4 AGREED)
Collection: Polygon Grouped Daily
- Endpoint:
/v2/aggs/grouped/locale/us/market/stocks/{date} — ONE API call = ALL US stocks
- Coverage: ~8,000+ tickers (NYSE, NASDAQ, AMEX, OTC)
- Fields: Open, High, Low, Close, Volume, VWAP
- Frequency: 2x daily (market open snapshot + after close)
- Cost: Included in $29/mo Polygon plan
Storage Tiers (Sonnet + Gemini):
- Tier 1 (all fields, daily): Everything. Store it all. Disk is cheap.
- Tier 2 (promote to intraday): Any ticker that triggers a scanner alert (RVOL >3, price move >10%) — collect 1-min bars for 5 trading days via Polygon aggregate endpoint
Penny Stock Definition:
- Price < $5 OR market cap < $300M
- Flag and tag in database for separate scanning
- Track "graduation" events (crossing above $5)
2. Earnings & Corporate Actions (ALL 4 AGREED)
From Polygon (included in $29/mo):
Earnings Calendar — /v3/reference/tickers/{ticker}/events
- Next 90 days rolling window
- Store: date, EPS estimate, EPS actual, revenue estimate, revenue actual, surprise %
- Run daily morning pull
Dividends — /v3/reference/dividends
- Ex-dates, record dates, payment dates, amounts
- Yield changes flag
Stock Splits — /v3/reference/splits
- Forward splits (retail interest driver)
- Reverse splits — critical penny stock red flag (Grok + Sonnet: often precedes pump or dilution)
Analyst Ratings (Gemini)
- Track rating CHANGES (Hold -> Buy is the signal, not the rating itself)
- Cluster of upgrades = strong signal
3. Short Interest & Flow Data (ALL 4 AGREED)
Short Interest:
- Source: Polygon (daily for top 1000) + FINRA biweekly (free, all stocks)
- Key derived metrics:
- SI% of Float — anything >20% is significant, >30% is squeeze territory
- Days to Cover = Short Interest / Average Daily Volume — >10 days = trapped shorts
- SI% Change week-over-week — rising SI + rising price = squeeze setup
Flow Proxy (Gemini — clever):
- For our ~150 options symbols: scan for contracts where Volume > Open Interest
- This means NEW positions opened today (not existing positions rolling)
- Filter for OTM calls with <30 DTE — speculative bullish positioning
- Cross-reference with stock price move for confirmation
4. Kalshi/Prediction Market Stock Series (ALL 4 AGREED)
Must-Track:
- KXSPY — S&P 500 direction (highest Kalshi stock volume)
- KXNASDAQ / KXNDX — Nasdaq direction
- KXINX — Dow/broad market
- Individual stock earnings markets — "Will TSLA trade above $X after earnings?"
- VIX markets — compare Kalshi VIX pricing vs our VIX futures term structure
Cross-Venue Arb (ALL 4):
Compare Kalshi implied probability vs options-implied probability (Black-Scholes d2) for the same event. When gap > 10% after transaction costs = trade signal.
5. Advanced Metrics — THE EDGE (ALL 4 AGREED on core set)
Metric 1: Insider Cluster Score (ALL 4 — HIGHEST PRIORITY)
Already have EDGAR Form 4 data. Pure SQL computation.
Score per company (rolling 30 days):
+3 points: CEO/CFO buy
+2 points: VP/COO buy
+1 point: Director/Officer buy
+1 bonus: Buy size > 50% of current holdings
x1.5 multiplier: 3+ insiders buy within 7 days
x2.0 multiplier: Buys near 52-week low
Alert threshold: Score > 8
High-conviction threshold: Score > 12
Metric 2: Relative Volume (RVOL) Scanner (ALL 4)
RVOL = Today's Volume / 20-Day Average Volume
Alerts:
- RVOL > 3.0 + price up > 5% = "In Play" (find 90% of day's biggest movers)
- RVOL > 5.0 + penny stock = potential runner
- RVOL > 10.0 = extreme event (news, earnings, squeeze)
Metric 3: Penny Stock Breakout Probability Score (ALL 4)
Composite score combining:
- Volume surge (>3x 20-day avg = +3, >5x = +5, >10x = +8)
- Price vs moving averages (above 50-day MA for first time in 30+ days = +3)
- Float rotation (today volume / public float > 50% = +5)
- Short squeeze setup (SI% > 20% + RVOL > 3 = +4)
- Recent insider buying (cluster score > 5 = +3)
- No dilution risk (no recent S-3/ATM filing = +2)
Metric 4: Smart Money Velocity (Gemini + Opus)
From existing 13F data. Track top 20-30 high-performing funds.
- Flag: NEW positions initiated
- Flag: Stake increased >25%
- Flag: Multiple top funds buying same stock within same quarter
- Cross-reference with insider buying for double-confirmation
Metric 5: Options-Prediction Market Arb Score (ALL 4)
Already building this in options-metrics.ts. Extend to all Kalshi stock markets:
arb_score = |Prob(Kalshi) - Prob(Options d2)| - transaction_costs
If arb_score > 0.10 (10%) = trade signal
Metric 6: Dilution Risk Score (Gemini — penny stock defense)
Scan SEC EDGAR 8-K and S-1/S-3 filings for keywords:
- "at-the-market offering"
- "equity distribution agreement"
- "shelf registration"
- "registered direct offering"
High score = AVOID this penny stock (about to be diluted)
Metric 7: Sector Momentum Lag (Sonnet)
When a large-cap moves >5% in a day, find penny stocks in the same sector that haven't moved yet. The laggards often follow within 1-3 days.
Metric 8: Earnings Surprise Predictor (Gemini + Sonnet)
Before earnings, combine:
- Insider buying/selling in last 30 days
- Options IV vs historical earnings moves
- Prediction market odds
- Analyst estimate revisions
Output: probability of beat/miss, expected magnitude
6. Penny Stock Specific Data (Gemini + Grok strongest here)
Must-Collect:
- Public float — from Polygon reference data. Low float (<20M shares) = explosive potential
- Float rotation — today volume / float. >50% = massive retail interest
- OTC tier status — Pink Current vs Caveat Emptor (Grok: avoid Caveat Emptor = scam flag)
- Reverse split history — red flag for chronic diluters
- 8-K filing velocity — sudden increase in filings = something happening
Pre-Runner Signals:
- Ignition Signal (Gemini): Price crosses above 50-day MA for first time in 30+ days AND RVOL > 5
- Consecutive momentum: 3+ days of >5% gains with increasing volume each day (Grok)
- Float rotation >50%: massive turnover = retail piling in
- Insider buy + no dilution: cluster score > 5 AND no recent S-3 filing
7. Additional Data Sources
FREE (build now):
| Source |
What |
Why |
| SEC EDGAR RSS feed |
Real-time 8-K/S-3 filing alerts |
Penny stock catalysts + dilution detection. Free. |
| QuiverQuant |
Congress trades, govt contracts |
Congress members beat the market. Free tier. |
| Polygon reference data |
Float, shares outstanding, market cap |
Already paid for. Not collecting yet. |
BOSS DECISION NEEDED:
| Source |
Cost |
What |
Impact |
| Social sentiment (StockTwits/Reddit API) |
$0-100/mo |
Penny stock mention velocity |
HIGH for penny stocks |
| Benzinga Pro API |
~$99/mo |
Breaking news, unusual activity |
HIGH for all stocks |
| FINVIZ API |
~$40/mo |
Screener data, analyst ratings |
MEDIUM |
SKIP for now:
| Source |
Why Skip |
| Dark pool data (SqueezeMetrics) |
$50-100/mo, can proxy from options flow |
| Bloomberg |
$24K/yr, overkill |
| Satellite/alt data |
Expensive, narrow use case |
8. Strategies Enabled
Tier 1 (Build First):
- Penny Stock Runner Scanner — RVOL + float rotation + breakout score = catch runners early
- Insider Cluster Following — buy what C-suite is buying, especially near 52-week lows
- Short Squeeze Engine — SI% >20% + Days to Cover >10 + catalyst = explosive
- Kalshi-Options Probability Arb — systematic mispricing exploitation
Tier 2 (Build After):
- Smart Money Mimicry — follow top 20 fund new positions from 13F
- Earnings Surprise Predictor — combine insider + options + prediction market signals
- Sector Momentum Lag — large-cap moves, penny stock follows
- Macro Regime Overlay — use VIX/FRED regime classifier to weight strategies
Build Order
Phase 1 — Data Collection (Week 1):
- stock-collector.ts — Polygon grouped daily for ALL stocks + reference data (float, market cap, shares outstanding)
- earnings-collector.ts — Polygon earnings calendar + actuals + dividends + splits + analyst ratings
- short-interest-collector.ts — Polygon daily (top 1000) + FINRA biweekly (all)
- Cron wiring — grouped daily 2x/day, earnings daily 7AM, short interest daily 8AM + biweekly
Phase 2 — Metrics & Scanners (Week 2):
- stock-metrics.ts — insider cluster score, RVOL scanner, penny stock breakout score, smart money velocity, dilution risk score, sector lag detector
- Cron wiring — metrics after each data collection run
Phase 3 — Cross-Venue (Week 3):
- Extend options-metrics.ts — add Kalshi-options arb score for stock events
- SEC 8-K RSS parser — real-time filing alerts for penny stocks
Architecture Note
"Penny stocks are the most mispriced securities on earth. They have the least analyst coverage, the worst information efficiency, and the highest retail participation. Every data signal matters more here because fewer people are looking." — Synthesis principle
Source: ~/edgeclaw/results/panel-results/stocks-data-final-ruling.md