WNCAAB Desk — Data Collection Spec (Mar 26, 2026)

What This Document Is

This is the complete data collection specification for the WNCAAB (Women's College Basketball) desk. An AI builder should be able to read this and know exactly what data to collect, from where, how often, and why.

The Business Model (Read This First)

We trade on Kalshi and Polymarket — prediction markets where you buy/sell contracts priced 0-100 cents. We find mispriced lines by comparing prediction market prices to Pinnacle (the sharpest traditional sportsbook). When Kalshi/Polymarket prices are wrong relative to Pinnacle's fair value, we buy the cheap side.

WNCAAB has thinner markets than NCAAB. Fewer traders, wider spreads, slower price updates. This means mispricings are often LARGER when they exist, but liquidity is lower and Pinnacle may not always offer lines.

Only bet when Pinnacle anchor exists. No Pinnacle line = no reliable fair value = no trade.

WNCAAB is 2-way only (no draw). Standard multiplicative de-vig applies to all markets.


SHARED DATA (collected for all sports)

Prediction Market Lines — Kalshi + Polymarket

Pull EVERY line offered for every game — every alternate spread, every alternate total, every player prop.

What to pull per market/line:

Two snapshots: Early (morning ~11 AM ET) and Closing (~10 min before game).

Sharp Book Fair Values — Pinnacle

What to pull:

How to get Pinnacle data:

Note: Pinnacle coverage for WNCAAB is limited. Not all games will have lines. Only proceed with edge detection when Pinnacle anchor exists.

Other Sportsbook Lines — DraftKings, FanDuel

Results & Grading

After each game: final score, total points, winner, margin, ATS results, O/U results. All derived metrics calculated from score + stored lines.

Market-Implied Probability Curves (extracted from Kalshi alt lines)

After each Kalshi snapshot, all alt lines for a game are grouped by market type (spread, total) and converted into a probability curve. Less alt line coverage for WNCAAB, but still valuable when available.

What gets stored per curve: Sport, game key, home/away teams, market type (spread or total), snapshot type (early/closing), array of threshold values, array of implied probabilities, number of points, mean probability, curve slope.

DB Table: market_implied_curves (in research-pipeline.db) Frequency: Runs automatically after every Kalshi snapshot (every 30 minutes). Minimum: 3+ alt lines required to form a curve.

Live In-Game Data (every 1 minute per game)

Game state + all Kalshi/Polymarket prices every minute during live games. No Pinnacle live tracking.

Player Props Layer

Thinner than NCAAB. Monitor star players on major games only.

DFS Layer

DraftKings salaries + ownership projections when WNCAAB slates are offered.


WNCAAB-SPECIFIC DATA

Team Statistics

Category Stats Source
Efficiency Adjusted offensive efficiency (AdjOE), adjusted defensive efficiency (AdjDE), adjusted tempo HerHoopStats + BartTorvik WNCAAB
Ratings Power ratings, SOS (strength of schedule) HerHoopStats + BartTorvik
Four Factors eFG%, ORB%, TOV%, FTR + defensive versions HerHoopStats
Situational Overall/conference/home-away win% ESPN + calculated

Ratings & Win Probability Models

Model Source
HerHoopStats herhoopstats.com (scrape)
BartTorvik WNCAAB barttorvik.com (scrape)
ESPN rankings ESPN API (when available)

Fewer models available than NCAAB. Average what's available, exclude NULL silently.

Clutch/Late-Game Execution

How teams perform in the final 5 minutes of close games (within 5 points). College women's basketball has significant variance here — free throw shooting under pressure is especially variable.

Stat What It Measures
Clutch scoring differential Points scored minus points allowed in final 5 min of close games
Clutch turnover rate Turnovers per possession in clutch situations
Clutch foul rate Fouls committed in clutch situations
Clutch FT% Free throw shooting in clutch situations
Clutch win rate Win % when game is within 5 points with 5 min left

Source: ESPN API play-by-play data (free). Filter for score margin <= 5 with <= 5:00 remaining. Frequency: Once daily, calculated from rolling season game logs. DB Table: wncaab_clutch_stats Why it creates edge: Same principle as NCAAB — teams that choke late cover tight spreads but fail on fat alt lines. Thinner WNCAAB markets make this even more exploitable because fewer traders are watching.


WNCAAB-SPECIFIC CONSIDERATIONS

Less Data, Thinner Coverage

Tournament Play (March Madness)


SHARED DATA GAPS (apply to all sports including WNCAAB)

Gap #1: Player Prop Data as Leading Indicator

Track prop line movement from FanDuel. Less applicable to WNCAAB due to thin prop markets, but monitor star players in major games.

Gap #2: Real-Time Injury & Lineup Speed

Monitor breaking injury/lineup news faster than prediction markets react.

Gap #3: Social Media & News Sentiment

Real-time X monitoring for injury leaks and lineup news.

Gap #4: In-Game Contextual Flow Data

Track momentum indicators during live games: scoring runs, foul trouble, pace variance.

Gap #5: Order Book Depth & Liquidity

Full bid/ask depth from Kalshi/Polymarket. Especially important for WNCAAB where liquidity is thinner.

Gap #6: Team Variance & Skewness Metrics

Calculate scoring volatility. WNCAAB has larger talent gaps = more extreme outcomes = more fat tail opportunities.

Gap #7: Advanced Fatigue & Travel Metrics

Track travel, timezone crossings, schedule density.

Gap #10: Game Script Volatility / Fat Tails

Model full distribution of scoring margins. WNCAAB talent gaps make fat tails more common.

Gap #11: Period-Specific Scoring Distribution

Break down scoring by half. Some teams are strong first-half starters, others are strong closers.


DATA SOURCES SUMMARY

FREE APIs

Source What Access
Kalshi API ML, spread, O/U, all alt lines, player props, volume API key (have it)
Polymarket Event markets, prices, volume Free API
ESPN API Scores, game state, play-by-play, BPI Free, no key
Odds API Pinnacle odds (fallback only) API key, rate-limited

NEED SCRAPING

Source What
Pinnacle Sharp lines (when available for WNCAAB)
HerHoopStats Efficiency ratings, team stats
BartTorvik WNCAAB team rankings
DraftKings Opening/current lines

CUSTOM CALCULATIONS


COLLECTION SCHEDULE

Data Type Frequency Source
Team statistics Once daily HerHoopStats, BartTorvik, ESPN
Model predictions Once daily HerHoopStats, BartTorvik, ESPN
Injury/lineup Continuous ESPN API, team feeds
Results After games ESPN API + settlement

Pre-Game Odds Schedule

Pinnacle: When available. Check Odds API. Kalshi + Polymarket (FREE): Every 30 minutes from lines open until game time.

Source: ~/.claude/projects/-home-ubuntu-edgeclaw/memory/wncaab-desk-data-inventory.md