Tennis Research Pipeline — Council Ruling
Date: 2026-04-01
Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling)
Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b
Winner: Sonnet (4 of 5 peer review votes — near-unanimous)
Status: PENDING BOSS RULING on open questions
COUNCIL SUMMARY
Where Advisors Agreed
- Serve-hold probability is the atomic unit — every tennis market (match winner, spread, totals, set scores) derives from P(A holds serve) and P(B holds serve)
- Surface-specific Elo ratings — hard/clay/grass/indoor hard treated as separate dimensions, not adjustments (200+ Elo point swings between surfaces)
- Best-of-3 vs best-of-5 format changes win probability distributions fundamentally (60% per-set favorite → 65% BO3, 68% BO5)
- 5 edge scanners required — Match Winner, Game Spread, Total Games, Exact Set Score, Tournament Outright
- Sackmann tennis data as primary free source (Jeff Sackmann's GitHub — match results, serve stats, Elo ratings)
- Injury/withdrawal intelligence is critical — tennis has routine mid-match retirements unlike team sports
- Pinnacle as sharp anchor — de-vig for true probabilities, compare to Kalshi
- Draw path simulation for outrights — bracket-based Monte Carlo, not simple Elo head-to-head
- Fatigue and scheduling context — back-to-back matches, timezone travel, surface transitions
- Motivation quantification — ranking points to defend, Grand Slam seeding cutoffs, year-end finals qualification
Where Advisors Disagreed
- Simulation approach: Most recommended Monte Carlo (10K-100K sims). Reviewers noted tennis is a solvable Markov Process — closed-form solutions exist for match/set/game probabilities. Council verdict: Markov chain for pre-match pricing (exact, fast), Monte Carlo reserved for tournament outrights (bracket pathing).
- WTA vs ATP modeling: Most advisors focused on ATP. Sonnet explicitly noted WTA has fundamentally different hold rates and variance. Council verdict: Separate model calibration for ATP and WTA — different hold rate distributions, break frequencies, and retirement patterns.
- Data sources: Gemini was generic, gpt-oss recommended enterprise stack (Kafka, Snowflake, Kubernetes). Council verdict: Sackmann data (free), Pinnacle odds, targeted web scraping for injury signals. SQLite WAL.
- Press conference sentiment: Some advisors proposed NLP on press conferences. Council verdict: Low-signal, high-noise — defer. Focus on quantitative signals first.
Strongest Arguments (from peer review)
Sonnet wins near-unanimously — 4 of 5 reviewers selected Sonnet as strongest:
- Opened with "why tennis is structurally different" — demonstrated genuine conceptual understanding before designing
- Serve-hold probability as atomic unit explicitly stated as design principle (not afterthought)
- Complete SQLite schema with exact column types (players, matches, match_serve_stats, player_surface_stats, h2h_records, tournaments, draws, player_status)
- Surface-specific Elo as separate dimension (hard/clay/grass/indoor hard)
- Grand Slam fifth-set rule differences (Wimbledon 12-12 tiebreak vs US Open supertiebreak vs French Open no-tiebreak)
- Sackmann data gap awareness for Challenger serve stats (widen confidence intervals)
- Calibration as core design requirement, not afterthought
- 1,000,000 simulation count (highest specified)
- Phased build plan with realistic scope
Biggest Blind Spot
Gemini: No data pipeline, no database schema, no data sources, no ingestion schedule. Pure strategy with zero implementation. Also naive fatigue modeling (linear weighted sum) and overconfidence in low-signal data (press conference NLP).
What Everyone Missed (from peer reviews)
- Markov chains over Monte Carlo for pre-match — Tennis has discrete, reset states (love-all, 15-all, tiebreaker). Closed-form Markov chain matrices give exact probabilities in milliseconds. Monte Carlo adds unnecessary variance for pre-match pricing.
- Bookmaker retirement rules differ — "1-Ball" (match stands after first serve), "1-Set" (stands after one set), "Full Match" (voided). Injury intelligence is useless without routing to correct bookmaker rules.
- Match-fixing/integrity flags for lower tiers — Challenger and ITF levels have rampant fixing. Need automatic "Integrity Kill-Switch" when Pinnacle moves 40+ cents against model prediction on lower-tier matches.
- Kalshi line availability lag — Kalshi may not list tennis matches until hours before. Pipeline must handle late market appearance.
- In-play Bayesian updating — After each game, update serve-hold estimates from observed first-serve % and break point conversion. No advisor designed a live data integration layer.
BUILD PLAN
Phase 1: Core Tennis Data Tables
tennis_players: player_id, full_name, preferred_name, nationality, hand (L/R), height_cm, current_rank, peak_rank, tour (ATP/WTA/Challenger), status, coach, elo_overall, elo_hard, elo_clay, elo_grass, elo_indoorhard, injury_flag, injury_notes
tennis_matches: match_id, tournament_id, match_date, round, surface, indoor_outdoor, player1_id, player2_id, winner_id, score, sets_played, games_p1, games_p2, retirement, walkover, duration_minutes, session (day/night), format (BO3/BO5), source
tennis_serve_stats: stat_id, match_id, player_id, aces, double_faults, first_serve_pct, first_serve_won, second_serve_won, bp_faced, bp_saved, service_hold_pct, return_bp_won_pct, total_points_won
tennis_player_surface_stats: player_id, surface, window_days (90/180/365/career), matches_played, win_pct, avg_hold_pct, avg_break_pct, avg_games_per_set, elo_on_surface
tennis_h2h_records: player1_id, player2_id, surface, matches_played, p1_wins, p2_wins, p1_avg_hold, p2_avg_hold
tennis_tournaments: tournament_id, name, surface, indoor_outdoor, level (Grand Slam/Masters/500/250/Challenger), draw_size, sets_format (BO3/BO5), country, city, start_date, end_date, points_winner
tennis_player_status: player_id, date, status_type (injury/fatigue/motivation/form), signal_source, signal_tier (hard/soft/contextual), severity (1-10), notes, confidence
tennis_draws: tournament_id, round, position, player_id, seed, projected_opponent_ids (JSON)
tennis_referee_data: umpire_id, name, avg_time_violations, overrule_rate, code_violations_per_match
Phase 2: Distribution Models
Core engine: Point-win probability → Game-win probability → Set-win probability → Match-win probability
| Level |
Method |
Parameters |
| Point |
Logistic regression |
Surface Elo, serve stats, fatigue, weather, altitude |
| Game |
Markov chain (exact) |
P(server wins point on serve) — nonlinear: 65% point-win ≈ 83% game-win |
| Set |
Markov chain (exact) |
P(hold) per player, tiebreak rules per tournament |
| Match |
Markov chain (exact for BO3/BO5) |
Set-win probabilities, fifth-set rule variants |
| Outright |
Monte Carlo (1M bracket sims) |
Draw path, surface Elo, fatigue accumulation through rounds |
All 5 markets derived from same underlying serve-hold probabilities.
Phase 3: 5 Edge Scanners
Common engine:
- Ingest Pinnacle odds (all markets)
- De-vig using Shin method (2-way for match winner, multi-way for set scores)
- Build Markov chain probability matrices from serve-hold estimates
- Compare to Kalshi contract prices
- Min edge: 4 cents after Kalshi 7% fee
- Min sample: 10+ matches on current surface for both players
- Integrity check: flag if Pinnacle line moves 40+ cents against model on Challenger/ITF
- Output:
{match_id, market, selection, model_prob, kalshi_price, edge, confidence, surface, tier}
Per-market unique logic:
| Scanner |
Unique Logic |
| Match Winner |
Markov chain from serve-hold; surface Elo primary driver; retirement risk discount |
| Game Spread |
Game distribution from match simulation; weather/altitude adjust hold rates |
| Total Games |
Combined games distribution; shorter matches on fast surfaces; BO3 vs BO5 total games range |
| Exact Set Score |
Full set-by-set probability matrix; tiebreak rules per tournament; BO5 has 10 possible scores |
| Tournament Outright |
1M bracket Monte Carlo; draw path dependency; fatigue accumulation; surface transitions within tournament |
Phase 4: Matchup Card Format
MATCH: [Player A] vs [Player B] | [Tournament] [Round] | [Date]
SURFACE: [hard/clay/grass] | [indoor/outdoor] | FORMAT: [BO3/BO5]
COURT: [Name] | SESSION: [Day/Night] | WEATHER: [Temp/Wind/Humidity]
PLAYER A: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
Surface Record (12mo): [W-L] ([win%]) | Hold Rate: [%] | Break Rate: [%]
Form (Last 5): [W/D/L sequence] | Last Match: [Date, Opponent, Score]
Serve Stats: 1st Serve [%] | 1st Serve Won [%] | 2nd Serve Won [%]
Aces/G: [avg] | DF/G: [avg] | BP Save Rate: [%]
vs [L/R]-handed: Win Rate [%] | Hold Rate [%]
PLAYER B: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
[Same fields as Player A]
H2H (All surfaces): [A wins]-[B wins]
H2H ([Surface]): [A wins]-[B wins] | A Hold: [%] | B Hold: [%]
STATUS:
Player A: [Injury/fatigue/motivation flags with severity and source tier]
Player B: [Same]
SCHEDULING:
Player A: Days Since Last Match: [n] | Previous Surface: [val] | Timezone Travel: [Y/N]
Player B: [Same]
MOTIVATION:
Player A: Defending [X pts] | Ranking Impact: [rise/fall to #Y]
Player B: [Same]
DRAW CONTEXT:
Next Round Likely Opponent: [Name (Rank/Seed)]
Quarter Contains: [Top seeds in section]
INTELLIGENCE:
[CRITICAL/MODERATE/CONTEXT findings with signal tier]
Phase 5: Dashboard
- Daily match board: All matches across ATP/WTA/Challenger with edge counts, surface, tier
- Match drill-down: Full matchup card + 5 market edges + serve stat comparison + H2H
- Surface form tables: Player rankings by surface with Elo, hold rates, win rates
- Injury/status board: All flagged players with signal tier and severity
- Tournament tracker: Draw brackets with model win probabilities per player
- Edge alerts: Sorted by magnitude, filterable by market type, surface, tier
- Integrity monitor: Challenger/ITF matches with suspicious line movements
- P&L tracker: By market type, by surface, by tier, Brier scores
- Calibration dashboard: Rolling accuracy by market, surface, and confidence band
OPEN QUESTIONS FOR BOSS RULING
- Tour scope: ATP + WTA + Grand Slams + Challengers? Or ATP + WTA main tour only to start?
- Markov chain vs Monte Carlo: Council recommends Markov chain for pre-match (exact, fast) and Monte Carlo only for outrights. Confirm?
- Sackmann data scraping: Free GitHub data but needs regular pulling and parsing. Build automated pipeline?
- WTA separate calibration: Different hold rates, variance, retirement patterns. Build separate WTA model parameters?
- Tournament outright scanner: High variance, complex bracket simulation. Build now or defer?
- Integrity kill-switch: Auto-block Challenger/ITF bets when Pinnacle moves 40+ cents against model? Or flag for manual review?
- Indoor hard court Elo: Separate from outdoor hard? Adds complexity but captures real differences (faster courts, no weather).
COUNCIL METADATA
| Detail |
Value |
| Council date |
2026-04-01 |
| Advisory responses |
5 (all completed) |
| Peer reviews |
5 (all completed) |
| Strongest advisor |
Sonnet (4/5 votes — near-unanimous) |
| Runner-up |
Opus (1/5 vote from Grok) |
| Biggest blind spot |
Gemini (no implementation, generic) |
| Full council data |
/home/ubuntu/edgeclaw/data/councils/2026-04-01/tennis-research/ |
Source: ~/edgeclaw/results/panel-results/tennis-research-ruling.md