Tennis Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Gemini (3 of 5 peer review votes) Status: PENDING BOSS RULING on open questions

COUNCIL SUMMARY

Where Advisors Agreed

Service Points Won (SPW) is the atomic metric — not game hold %, but point-level serve/return win probability
Surface-Adjusted Elo (SA-Elo) with separate ratings per surface (hard/clay/grass/indoor hard)
Sackmann GitHub repos as primary free data source — tennis_atp and tennis_wta CSVs with decades of match data
Markov chains for match/set/game pricing — exact probabilities in milliseconds vs Monte Carlo variance
Monte Carlo reserved for tournament outrights — bracket path simulation with fatigue accumulation
Fatigue tracking critical — court time in last 7/14 days, matches played, travel zones crossed
Retirement probability model needed — logistic regression on age, recent workload, MTOs, heat index
H2H with time decay — recent matches weighted heavily, old matches discounted, minimum 3 meetings to apply
Pinnacle as sharp anchor — de-vig for true probabilities
5 edge scanners — Match Winner, Game Spread, Total Games, Exact Set Score, Tournament Outright

Where Advisors Disagreed

Monte Carlo vs Markov chain: Some recommended MC for all markets. Gemini correctly identified that tennis is a structured scoring system solvable by Markov chains. Council verdict: Markov chains for match pricing, MC for outrights only.
Elo weighting formula: Different blends of overall/surface/form Elo proposed. Council verdict: 0.5 × OverallElo + 0.3 × SurfaceElo + 0.2 × RecentFormElo.
Database complexity: Some proposed 20+ tables, others kept it lean. Council verdict: Lean schema that maps directly to Sackmann CSV headers for easy ingestion.
Paid data sources: Some recommended Sportradar/TDI enterprise feeds. Council verdict: Start with free Sackmann data, evaluate paid feeds only if edge requires it.

Strongest Arguments (from peer review)

Gemini wins with the most operationally actionable response:

Correctly identified Markov chains > Monte Carlo for structured tennis scoring (O'Malley equations)
Log5 method for serve/return matchup rather than raw hold %
Exact hold probability formula: P(Hold) = p^4 + 4p^4(1-p) + 10p^4(1-p)^2 + 20p^5(1-p)^3/(1-2p(1-p))
Concrete cost sequencing (free sources first, paid later)
SA-Elo with dynamic K-factor (higher for young/returning players)
Ready-to-use JSON matchup card schema
Every metric tied back to "beat Pinnacle's closing line" — the actual KPI
Fatigue formula with travel zone crossing penalty
Best-of-5 conversion formula from set-win probability

Biggest Blind Spot

No model validation/backtesting framework — All advisors proposed formulas and schemas but none addressed: calibration against closing Pinnacle lines, out-of-sample testing, objective functions, small-sample handling for young/surface-limited players, or overfitting risk on H2H and clutch metrics.

What Everyone Missed (from peer reviews)

Model validation discipline — No backtesting methodology against closing Pinnacle/Betfair lines. Beautiful math without edge proof.
Small-sample problems — A 22-year-old on grass with 180 career points is not the same as Nadal on clay. Need confidence intervals that widen with sample scarcity.
Overfitting risk on clutch metrics — "Clutch" is mostly variance. Tiebreak performance and deciding-set records should be mean-reverted heavily.
Live intelligence sourcing lag — Practice session reports and pre-match injury intel is the hardest part. No advisor addressed how to actually source this faster than the market.
Court pace index — Courts within the same surface category vary dramatically (e.g., Indian Wells hard vs US Open hard). Need court pace measurement per venue.

BUILD PLAN

Phase 1: Core Tennis Data Tables

tennis_players: player_id (Sackmann ID), full_name, preferred_name, nationality, hand (L/R), height_cm, current_rank, peak_rank, tour (ATP/WTA/Challenger), status, elo_overall, elo_hard, elo_clay, elo_grass, elo_indoorhard, injury_flag, last_match_date

tennis_matches: match_id, tournament_id, match_date, round, surface, indoor_outdoor, player1_id, player2_id, winner_id, score, sets_played, games_p1, games_p2, retirement, walkover, duration_minutes, format (BO3/BO5), session (day/night)

tennis_serve_stats: stat_id, match_id, player_id, aces, double_faults, first_serve_pct, first_serve_won_pct (w_1stWon), second_serve_won_pct (w_2ndWon), service_points_won (SPW), return_points_won (RPW), bp_faced, bp_saved, service_hold_pct, break_conversion_pct

tennis_player_surface_baselines: player_id, surface, window_days (90/180/365), matches_played, win_pct, avg_spw, avg_rpw, avg_hold_pct, avg_break_pct, avg_games_per_set, sa_elo, updated_at

tennis_h2h: player1_id, player2_id, surface, matches_played, p1_wins, p2_wins, p1_avg_spw, p2_avg_spw, last_meeting_date

tennis_tournaments: tournament_id, name, surface, indoor_outdoor, level (GS/Masters/500/250/Challenger), draw_size, format (BO3/BO5), country, city, court_pace_index, start_date, end_date, points_winner

tennis_draws: tournament_id, round, position, player_id, seed, bye (boolean)

tennis_player_status: player_id, date, status_type (injury/fatigue/motivation), signal_source, signal_tier (hard/soft/contextual), severity (1-10), notes

tennis_fatigue: player_id, date, minutes_last_7d, minutes_last_14d, matches_last_7d, matches_last_14d, travel_zones_crossed, fatigue_score, sets_played_last_7d

tennis_weather: tournament_id, match_date, session, temperature_f, humidity_pct, wind_speed_mph, wind_direction, heat_index

Phase 2: Custom Metrics

Metric	Formula	Purpose
Surface-Adjusted Elo (SA-Elo)	0.5 × OverallElo + 0.3 × SurfaceElo + 0.2 × RecentFormElo; dynamic K-factor (higher for young/returning)	Single player strength number per surface
Serve/Return Matchup (Log5)	P(A_serve) = (SPW_A × RPW_B_allowed) / TourAvgSPW	Point-level serve win probability in specific matchup
Hold Probability	p^4 + 4p^4(1-p) + 10p^4(1-p)^2 + 20p^5(1-p)^3/(1-2p(1-p)) where p = P(A_serve)	Service game hold probability from point-win probability
Fatigue Score	Minutes_7d + 0.5 × Minutes_8to14d + TravelZones × 100	Cumulative physical load
Retirement Probability	Logistic: features = age, matches_14d, MTOs_last3, heat_index, surface	Pre-match retirement risk
BO5 Conversion	P(Bo5) = p^3 + 3p^3(1-p) + 6p^3(1-p)^2 where p = set-win probability	Convert BO3 form to BO5 probability
H2H Adjustment	Bayesian update on base Elo probability; time-decay (4yr ago = 0.1× last month); min N=3	Style matchup correction
Clutch Index	(Actual BP Won%) - (Expected BP Won% from RPW); mean-revert heavily	Mental edge indicator (high noise)

Phase 3: Distribution Models

Market	Method	Parameters	Notes
Match Winner	Markov chain (O'Malley)	P(A_serve), P(B_serve) from Log5	Exact probability, milliseconds
Game Spread	Markov chain	Game distribution from set simulations	Directly from point-level model
Total Games	Markov chain	Combined game counts from all possible set/match outcomes	No MC variance
Exact Set Score	Markov chain	Full set-by-set probability matrix; tournament-specific tiebreak rules	Every possible scoreline priced exactly
Tournament Outright	Monte Carlo (100K sims)	Bracket path with fatigue accumulation; SA-Elo updates per round	100K for 128-player GS draws

Phase 4: 5 Edge Scanners

Common engine:

Ingest Pinnacle odds (all markets)
De-vig using Shin method (2-way for match winner, multi-way for set scores)
Build Markov chain probability matrices from Log5 serve/return estimates
Compare to Kalshi contract prices
Min edge: 4 cents after Kalshi 7% fee
Min sample: 10+ matches on current surface for both players
Output: {match_id, market, selection, model_prob, kalshi_price, edge, confidence, surface, tier}

Per-market unique logic:

Scanner	Unique Logic
Match Winner	SA-Elo primary, Log5 serve/return, fatigue differential, H2H adjustment, retirement risk discount
Game Spread	Game distribution from Markov chain; BO3 vs BO5 spread ranges differ dramatically
Total Games	Court pace index adjustment; fast surfaces → fewer games; fatigue → more breaks → fewer games
Exact Set Score	Full score matrix from Markov chain; tiebreak rules per tournament (Wimbledon 12-12, USO supertiebreak)
Tournament Outright	100K bracket MC; draw path dependency; fatigue accumulation; surface transition within season

Phase 5: Matchup Card Format

MATCH: [Player A] vs [Player B] | [Tournament] [Round] | [Date]
SURFACE: [Type] | [Indoor/Outdoor] | Court Pace: [index] | FORMAT: [BO3/BO5]
WEATHER: [Temp]°F | Humidity: [%] | Wind: [mph] | Heat Index: [val]

PLAYER A: [Name] ([L/R]) | Rank: [#] | SA-Elo: [val]
  Surface Record (12mo): [W-L] ([win%])
  SPW: [%] | RPW: [%] | Hold Rate: [%] | Break Conv: [%]
  Aces/Match: [avg] | DF/Match: [avg] | 1st Serve: [%]
  vs [L/R]-handed: SPW [%] | RPW [%]
  Fatigue Score: [val] (Court Time 7d: [min], 14d: [min])
  Retirement Risk: [%]
  Clutch Index: [+/- from expected BP conversion]

PLAYER B: [Name] ([L/R]) | Rank: [#] | SA-Elo: [val]
  [Same fields]

MATCHUP DELTAS:
  SA-Elo Differential: [+/- val] → Win Prob: [%]
  Log5 Serve Matchup: A serve vs B return: [P(A_serve)] | B serve vs A return: [P(B_serve)]
  Fatigue Differential: [val]

H2H: [A wins]-[B wins] (on [surface]: [A]-[B])
  Last 3 Meetings: [Date: Score] × 3
  Time-Decayed H2H Adjustment: [+/- %]

STATUS:
  Player A: [Injury/fatigue/motivation flags with signal tier]
  Player B: [Same]

SCHEDULING:
  Days Since Last Match: A [n] | B [n]
  Previous Surface: A [val] | B [val]
  Travel Zones Crossed: A [n] | B [n]

MOTIVATION:
  Player A: Defending [X pts] | Ranking Impact: [val]
  Player B: [Same]

INTELLIGENCE:
  [CRITICAL/MODERATE/CONTEXT findings]

Phase 6: Dashboard

Daily match board: All matches across ATP/WTA/Challenger with edge counts, surface, tier
Match drill-down: Full matchup card + 5 market edges + Markov probability matrices
Surface form tables: Player rankings by surface with SA-Elo, SPW, RPW, hold rates
Injury/status board: All flagged players with signal tier and severity
Tournament tracker: Draw brackets with model win probabilities per player
Fatigue monitor: Players with elevated fatigue scores, recent court time
Edge alerts: Sorted by magnitude, filterable by market type, surface, tier
P&L tracker: By market type, by surface, by tier, Brier scores
Calibration dashboard: Model predictions vs Pinnacle closing lines, rolling accuracy

OPEN QUESTIONS FOR BOSS RULING

Sackmann data pipeline: Build automated daily cron job to pull and parse Sackmann CSVs? This covers ~90% of needed data for free.
Markov chain implementation: Council strongly recommends Markov over Monte Carlo for match pricing. Confirm?
Court pace index: Track per-venue court pace (varies within same surface type). Add to tournament table?
WTA separate model: Different hold rates, higher upset rate (~38% vs ~32%). Separate calibration confirmed?
Challenger/ITF coverage: Include lower tiers? Higher edge potential but match-fixing risk and thin data.
Backtesting harness: Build rolling-window validation against Pinnacle closing lines before going live?
Historical depth: How many seasons of Sackmann data to backfill? 3 seasons? 5 seasons?

COUNCIL METADATA

Detail	Value
Council date	2026-04-01
Advisory responses	5 (all completed)
Peer reviews	5 (all completed)
Strongest advisor	Gemini (3/5 votes)
Runner-up	Opus, Sonnet (1/5 self-votes each)
Biggest blind spot	No model validation/backtesting across all advisors
Full council data	`/home/ubuntu/edgeclaw/data/councils/2026-04-01/tennis-data-audit/`

Source: ~/edgeclaw/results/panel-results/tennis-data-audit-ruling.md