Tennis Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Gemini (3 of 5 peer review votes) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Service Points Won (SPW) is the atomic metric — not game hold %, but point-level serve/return win probability
  2. Surface-Adjusted Elo (SA-Elo) with separate ratings per surface (hard/clay/grass/indoor hard)
  3. Sackmann GitHub repos as primary free data source — tennis_atp and tennis_wta CSVs with decades of match data
  4. Markov chains for match/set/game pricing — exact probabilities in milliseconds vs Monte Carlo variance
  5. Monte Carlo reserved for tournament outrights — bracket path simulation with fatigue accumulation
  6. Fatigue tracking critical — court time in last 7/14 days, matches played, travel zones crossed
  7. Retirement probability model needed — logistic regression on age, recent workload, MTOs, heat index
  8. H2H with time decay — recent matches weighted heavily, old matches discounted, minimum 3 meetings to apply
  9. Pinnacle as sharp anchor — de-vig for true probabilities
  10. 5 edge scanners — Match Winner, Game Spread, Total Games, Exact Set Score, Tournament Outright

Where Advisors Disagreed

  1. Monte Carlo vs Markov chain: Some recommended MC for all markets. Gemini correctly identified that tennis is a structured scoring system solvable by Markov chains. Council verdict: Markov chains for match pricing, MC for outrights only.
  2. Elo weighting formula: Different blends of overall/surface/form Elo proposed. Council verdict: 0.5 × OverallElo + 0.3 × SurfaceElo + 0.2 × RecentFormElo.
  3. Database complexity: Some proposed 20+ tables, others kept it lean. Council verdict: Lean schema that maps directly to Sackmann CSV headers for easy ingestion.
  4. Paid data sources: Some recommended Sportradar/TDI enterprise feeds. Council verdict: Start with free Sackmann data, evaluate paid feeds only if edge requires it.

Strongest Arguments (from peer review)

Gemini wins with the most operationally actionable response:

Biggest Blind Spot

No model validation/backtesting framework — All advisors proposed formulas and schemas but none addressed: calibration against closing Pinnacle lines, out-of-sample testing, objective functions, small-sample handling for young/surface-limited players, or overfitting risk on H2H and clutch metrics.

What Everyone Missed (from peer reviews)

  1. Model validation discipline — No backtesting methodology against closing Pinnacle/Betfair lines. Beautiful math without edge proof.
  2. Small-sample problems — A 22-year-old on grass with 180 career points is not the same as Nadal on clay. Need confidence intervals that widen with sample scarcity.
  3. Overfitting risk on clutch metrics — "Clutch" is mostly variance. Tiebreak performance and deciding-set records should be mean-reverted heavily.
  4. Live intelligence sourcing lag — Practice session reports and pre-match injury intel is the hardest part. No advisor addressed how to actually source this faster than the market.
  5. Court pace index — Courts within the same surface category vary dramatically (e.g., Indian Wells hard vs US Open hard). Need court pace measurement per venue.

BUILD PLAN

Phase 1: Core Tennis Data Tables

tennis_players: player_id (Sackmann ID), full_name, preferred_name, nationality, hand (L/R), height_cm, current_rank, peak_rank, tour (ATP/WTA/Challenger), status, elo_overall, elo_hard, elo_clay, elo_grass, elo_indoorhard, injury_flag, last_match_date

tennis_matches: match_id, tournament_id, match_date, round, surface, indoor_outdoor, player1_id, player2_id, winner_id, score, sets_played, games_p1, games_p2, retirement, walkover, duration_minutes, format (BO3/BO5), session (day/night)

tennis_serve_stats: stat_id, match_id, player_id, aces, double_faults, first_serve_pct, first_serve_won_pct (w_1stWon), second_serve_won_pct (w_2ndWon), service_points_won (SPW), return_points_won (RPW), bp_faced, bp_saved, service_hold_pct, break_conversion_pct

tennis_player_surface_baselines: player_id, surface, window_days (90/180/365), matches_played, win_pct, avg_spw, avg_rpw, avg_hold_pct, avg_break_pct, avg_games_per_set, sa_elo, updated_at

tennis_h2h: player1_id, player2_id, surface, matches_played, p1_wins, p2_wins, p1_avg_spw, p2_avg_spw, last_meeting_date

tennis_tournaments: tournament_id, name, surface, indoor_outdoor, level (GS/Masters/500/250/Challenger), draw_size, format (BO3/BO5), country, city, court_pace_index, start_date, end_date, points_winner

tennis_draws: tournament_id, round, position, player_id, seed, bye (boolean)

tennis_player_status: player_id, date, status_type (injury/fatigue/motivation), signal_source, signal_tier (hard/soft/contextual), severity (1-10), notes

tennis_fatigue: player_id, date, minutes_last_7d, minutes_last_14d, matches_last_7d, matches_last_14d, travel_zones_crossed, fatigue_score, sets_played_last_7d

tennis_weather: tournament_id, match_date, session, temperature_f, humidity_pct, wind_speed_mph, wind_direction, heat_index

Phase 2: Custom Metrics

Metric Formula Purpose
Surface-Adjusted Elo (SA-Elo) 0.5 × OverallElo + 0.3 × SurfaceElo + 0.2 × RecentFormElo; dynamic K-factor (higher for young/returning) Single player strength number per surface
Serve/Return Matchup (Log5) P(A_serve) = (SPW_A × RPW_B_allowed) / TourAvgSPW Point-level serve win probability in specific matchup
Hold Probability p^4 + 4p^4(1-p) + 10p^4(1-p)^2 + 20p^5(1-p)^3/(1-2p(1-p)) where p = P(A_serve) Service game hold probability from point-win probability
Fatigue Score Minutes_7d + 0.5 × Minutes_8to14d + TravelZones × 100 Cumulative physical load
Retirement Probability Logistic: features = age, matches_14d, MTOs_last3, heat_index, surface Pre-match retirement risk
BO5 Conversion P(Bo5) = p^3 + 3p^3(1-p) + 6p^3(1-p)^2 where p = set-win probability Convert BO3 form to BO5 probability
H2H Adjustment Bayesian update on base Elo probability; time-decay (4yr ago = 0.1× last month); min N=3 Style matchup correction
Clutch Index (Actual BP Won%) - (Expected BP Won% from RPW); mean-revert heavily Mental edge indicator (high noise)

Phase 3: Distribution Models

Market Method Parameters Notes
Match Winner Markov chain (O'Malley) P(A_serve), P(B_serve) from Log5 Exact probability, milliseconds
Game Spread Markov chain Game distribution from set simulations Directly from point-level model
Total Games Markov chain Combined game counts from all possible set/match outcomes No MC variance
Exact Set Score Markov chain Full set-by-set probability matrix; tournament-specific tiebreak rules Every possible scoreline priced exactly
Tournament Outright Monte Carlo (100K sims) Bracket path with fatigue accumulation; SA-Elo updates per round 100K for 128-player GS draws

Phase 4: 5 Edge Scanners

Common engine:

  1. Ingest Pinnacle odds (all markets)
  2. De-vig using Shin method (2-way for match winner, multi-way for set scores)
  3. Build Markov chain probability matrices from Log5 serve/return estimates
  4. Compare to Kalshi contract prices
  5. Min edge: 4 cents after Kalshi 7% fee
  6. Min sample: 10+ matches on current surface for both players
  7. Output: {match_id, market, selection, model_prob, kalshi_price, edge, confidence, surface, tier}

Per-market unique logic:

Scanner Unique Logic
Match Winner SA-Elo primary, Log5 serve/return, fatigue differential, H2H adjustment, retirement risk discount
Game Spread Game distribution from Markov chain; BO3 vs BO5 spread ranges differ dramatically
Total Games Court pace index adjustment; fast surfaces → fewer games; fatigue → more breaks → fewer games
Exact Set Score Full score matrix from Markov chain; tiebreak rules per tournament (Wimbledon 12-12, USO supertiebreak)
Tournament Outright 100K bracket MC; draw path dependency; fatigue accumulation; surface transition within season

Phase 5: Matchup Card Format

MATCH: [Player A] vs [Player B] | [Tournament] [Round] | [Date]
SURFACE: [Type] | [Indoor/Outdoor] | Court Pace: [index] | FORMAT: [BO3/BO5]
WEATHER: [Temp]°F | Humidity: [%] | Wind: [mph] | Heat Index: [val]

PLAYER A: [Name] ([L/R]) | Rank: [#] | SA-Elo: [val]
  Surface Record (12mo): [W-L] ([win%])
  SPW: [%] | RPW: [%] | Hold Rate: [%] | Break Conv: [%]
  Aces/Match: [avg] | DF/Match: [avg] | 1st Serve: [%]
  vs [L/R]-handed: SPW [%] | RPW [%]
  Fatigue Score: [val] (Court Time 7d: [min], 14d: [min])
  Retirement Risk: [%]
  Clutch Index: [+/- from expected BP conversion]

PLAYER B: [Name] ([L/R]) | Rank: [#] | SA-Elo: [val]
  [Same fields]

MATCHUP DELTAS:
  SA-Elo Differential: [+/- val] → Win Prob: [%]
  Log5 Serve Matchup: A serve vs B return: [P(A_serve)] | B serve vs A return: [P(B_serve)]
  Fatigue Differential: [val]

H2H: [A wins]-[B wins] (on [surface]: [A]-[B])
  Last 3 Meetings: [Date: Score] × 3
  Time-Decayed H2H Adjustment: [+/- %]

STATUS:
  Player A: [Injury/fatigue/motivation flags with signal tier]
  Player B: [Same]

SCHEDULING:
  Days Since Last Match: A [n] | B [n]
  Previous Surface: A [val] | B [val]
  Travel Zones Crossed: A [n] | B [n]

MOTIVATION:
  Player A: Defending [X pts] | Ranking Impact: [val]
  Player B: [Same]

INTELLIGENCE:
  [CRITICAL/MODERATE/CONTEXT findings]

Phase 6: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. Sackmann data pipeline: Build automated daily cron job to pull and parse Sackmann CSVs? This covers ~90% of needed data for free.
  2. Markov chain implementation: Council strongly recommends Markov over Monte Carlo for match pricing. Confirm?
  3. Court pace index: Track per-venue court pace (varies within same surface type). Add to tournament table?
  4. WTA separate model: Different hold rates, higher upset rate (~38% vs ~32%). Separate calibration confirmed?
  5. Challenger/ITF coverage: Include lower tiers? Higher edge potential but match-fixing risk and thin data.
  6. Backtesting harness: Build rolling-window validation against Pinnacle closing lines before going live?
  7. Historical depth: How many seasons of Sackmann data to backfill? 3 seasons? 5 seasons?

COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Gemini (3/5 votes)
Runner-up Opus, Sonnet (1/5 self-votes each)
Biggest blind spot No model validation/backtesting across all advisors
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/tennis-data-audit/
Source: ~/edgeclaw/results/panel-results/tennis-data-audit-ruling.md