Tennis Research Pipeline — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Sonnet (4 of 5 peer review votes — near-unanimous) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. Serve-hold probability is the atomic unit — every tennis market (match winner, spread, totals, set scores) derives from P(A holds serve) and P(B holds serve)
  2. Surface-specific Elo ratings — hard/clay/grass/indoor hard treated as separate dimensions, not adjustments (200+ Elo point swings between surfaces)
  3. Best-of-3 vs best-of-5 format changes win probability distributions fundamentally (60% per-set favorite → 65% BO3, 68% BO5)
  4. 5 edge scanners required — Match Winner, Game Spread, Total Games, Exact Set Score, Tournament Outright
  5. Sackmann tennis data as primary free source (Jeff Sackmann's GitHub — match results, serve stats, Elo ratings)
  6. Injury/withdrawal intelligence is critical — tennis has routine mid-match retirements unlike team sports
  7. Pinnacle as sharp anchor — de-vig for true probabilities, compare to Kalshi
  8. Draw path simulation for outrights — bracket-based Monte Carlo, not simple Elo head-to-head
  9. Fatigue and scheduling context — back-to-back matches, timezone travel, surface transitions
  10. Motivation quantification — ranking points to defend, Grand Slam seeding cutoffs, year-end finals qualification

Where Advisors Disagreed

  1. Simulation approach: Most recommended Monte Carlo (10K-100K sims). Reviewers noted tennis is a solvable Markov Process — closed-form solutions exist for match/set/game probabilities. Council verdict: Markov chain for pre-match pricing (exact, fast), Monte Carlo reserved for tournament outrights (bracket pathing).
  2. WTA vs ATP modeling: Most advisors focused on ATP. Sonnet explicitly noted WTA has fundamentally different hold rates and variance. Council verdict: Separate model calibration for ATP and WTA — different hold rate distributions, break frequencies, and retirement patterns.
  3. Data sources: Gemini was generic, gpt-oss recommended enterprise stack (Kafka, Snowflake, Kubernetes). Council verdict: Sackmann data (free), Pinnacle odds, targeted web scraping for injury signals. SQLite WAL.
  4. Press conference sentiment: Some advisors proposed NLP on press conferences. Council verdict: Low-signal, high-noise — defer. Focus on quantitative signals first.

Strongest Arguments (from peer review)

Sonnet wins near-unanimously — 4 of 5 reviewers selected Sonnet as strongest:

Biggest Blind Spot

Gemini: No data pipeline, no database schema, no data sources, no ingestion schedule. Pure strategy with zero implementation. Also naive fatigue modeling (linear weighted sum) and overconfidence in low-signal data (press conference NLP).

What Everyone Missed (from peer reviews)

  1. Markov chains over Monte Carlo for pre-match — Tennis has discrete, reset states (love-all, 15-all, tiebreaker). Closed-form Markov chain matrices give exact probabilities in milliseconds. Monte Carlo adds unnecessary variance for pre-match pricing.
  2. Bookmaker retirement rules differ — "1-Ball" (match stands after first serve), "1-Set" (stands after one set), "Full Match" (voided). Injury intelligence is useless without routing to correct bookmaker rules.
  3. Match-fixing/integrity flags for lower tiers — Challenger and ITF levels have rampant fixing. Need automatic "Integrity Kill-Switch" when Pinnacle moves 40+ cents against model prediction on lower-tier matches.
  4. Kalshi line availability lag — Kalshi may not list tennis matches until hours before. Pipeline must handle late market appearance.
  5. In-play Bayesian updating — After each game, update serve-hold estimates from observed first-serve % and break point conversion. No advisor designed a live data integration layer.

BUILD PLAN

Phase 1: Core Tennis Data Tables

tennis_players: player_id, full_name, preferred_name, nationality, hand (L/R), height_cm, current_rank, peak_rank, tour (ATP/WTA/Challenger), status, coach, elo_overall, elo_hard, elo_clay, elo_grass, elo_indoorhard, injury_flag, injury_notes

tennis_matches: match_id, tournament_id, match_date, round, surface, indoor_outdoor, player1_id, player2_id, winner_id, score, sets_played, games_p1, games_p2, retirement, walkover, duration_minutes, session (day/night), format (BO3/BO5), source

tennis_serve_stats: stat_id, match_id, player_id, aces, double_faults, first_serve_pct, first_serve_won, second_serve_won, bp_faced, bp_saved, service_hold_pct, return_bp_won_pct, total_points_won

tennis_player_surface_stats: player_id, surface, window_days (90/180/365/career), matches_played, win_pct, avg_hold_pct, avg_break_pct, avg_games_per_set, elo_on_surface

tennis_h2h_records: player1_id, player2_id, surface, matches_played, p1_wins, p2_wins, p1_avg_hold, p2_avg_hold

tennis_tournaments: tournament_id, name, surface, indoor_outdoor, level (Grand Slam/Masters/500/250/Challenger), draw_size, sets_format (BO3/BO5), country, city, start_date, end_date, points_winner

tennis_player_status: player_id, date, status_type (injury/fatigue/motivation/form), signal_source, signal_tier (hard/soft/contextual), severity (1-10), notes, confidence

tennis_draws: tournament_id, round, position, player_id, seed, projected_opponent_ids (JSON)

tennis_referee_data: umpire_id, name, avg_time_violations, overrule_rate, code_violations_per_match

Phase 2: Distribution Models

Core engine: Point-win probability → Game-win probability → Set-win probability → Match-win probability

Level Method Parameters
Point Logistic regression Surface Elo, serve stats, fatigue, weather, altitude
Game Markov chain (exact) P(server wins point on serve) — nonlinear: 65% point-win ≈ 83% game-win
Set Markov chain (exact) P(hold) per player, tiebreak rules per tournament
Match Markov chain (exact for BO3/BO5) Set-win probabilities, fifth-set rule variants
Outright Monte Carlo (1M bracket sims) Draw path, surface Elo, fatigue accumulation through rounds

All 5 markets derived from same underlying serve-hold probabilities.

Phase 3: 5 Edge Scanners

Common engine:

  1. Ingest Pinnacle odds (all markets)
  2. De-vig using Shin method (2-way for match winner, multi-way for set scores)
  3. Build Markov chain probability matrices from serve-hold estimates
  4. Compare to Kalshi contract prices
  5. Min edge: 4 cents after Kalshi 7% fee
  6. Min sample: 10+ matches on current surface for both players
  7. Integrity check: flag if Pinnacle line moves 40+ cents against model on Challenger/ITF
  8. Output: {match_id, market, selection, model_prob, kalshi_price, edge, confidence, surface, tier}

Per-market unique logic:

Scanner Unique Logic
Match Winner Markov chain from serve-hold; surface Elo primary driver; retirement risk discount
Game Spread Game distribution from match simulation; weather/altitude adjust hold rates
Total Games Combined games distribution; shorter matches on fast surfaces; BO3 vs BO5 total games range
Exact Set Score Full set-by-set probability matrix; tiebreak rules per tournament; BO5 has 10 possible scores
Tournament Outright 1M bracket Monte Carlo; draw path dependency; fatigue accumulation; surface transitions within tournament

Phase 4: Matchup Card Format

MATCH: [Player A] vs [Player B] | [Tournament] [Round] | [Date]
SURFACE: [hard/clay/grass] | [indoor/outdoor] | FORMAT: [BO3/BO5]
COURT: [Name] | SESSION: [Day/Night] | WEATHER: [Temp/Wind/Humidity]

PLAYER A: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
  Surface Record (12mo): [W-L] ([win%]) | Hold Rate: [%] | Break Rate: [%]
  Form (Last 5): [W/D/L sequence] | Last Match: [Date, Opponent, Score]
  Serve Stats: 1st Serve [%] | 1st Serve Won [%] | 2nd Serve Won [%]
  Aces/G: [avg] | DF/G: [avg] | BP Save Rate: [%]
  vs [L/R]-handed: Win Rate [%] | Hold Rate [%]

PLAYER B: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
  [Same fields as Player A]

H2H (All surfaces): [A wins]-[B wins]
H2H ([Surface]): [A wins]-[B wins] | A Hold: [%] | B Hold: [%]

STATUS:
  Player A: [Injury/fatigue/motivation flags with severity and source tier]
  Player B: [Same]

SCHEDULING:
  Player A: Days Since Last Match: [n] | Previous Surface: [val] | Timezone Travel: [Y/N]
  Player B: [Same]

MOTIVATION:
  Player A: Defending [X pts] | Ranking Impact: [rise/fall to #Y]
  Player B: [Same]

DRAW CONTEXT:
  Next Round Likely Opponent: [Name (Rank/Seed)]
  Quarter Contains: [Top seeds in section]

INTELLIGENCE:
  [CRITICAL/MODERATE/CONTEXT findings with signal tier]

Phase 5: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. Tour scope: ATP + WTA + Grand Slams + Challengers? Or ATP + WTA main tour only to start?
  2. Markov chain vs Monte Carlo: Council recommends Markov chain for pre-match (exact, fast) and Monte Carlo only for outrights. Confirm?
  3. Sackmann data scraping: Free GitHub data but needs regular pulling and parsing. Build automated pipeline?
  4. WTA separate calibration: Different hold rates, variance, retirement patterns. Build separate WTA model parameters?
  5. Tournament outright scanner: High variance, complex bracket simulation. Build now or defer?
  6. Integrity kill-switch: Auto-block Challenger/ITF bets when Pinnacle moves 40+ cents against model? Or flag for manual review?
  7. Indoor hard court Elo: Separate from outdoor hard? Adds complexity but captures real differences (faster courts, no weather).

COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Sonnet (4/5 votes — near-unanimous)
Runner-up Opus (1/5 vote from Grok)
Biggest blind spot Gemini (no implementation, generic)
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/tennis-research/
Source: ~/edgeclaw/results/panel-results/tennis-research-ruling.md