Tennis Research Pipeline — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Sonnet (4 of 5 peer review votes — near-unanimous) Status: PENDING BOSS RULING on open questions

COUNCIL SUMMARY

Where Advisors Agreed

Serve-hold probability is the atomic unit — every tennis market (match winner, spread, totals, set scores) derives from P(A holds serve) and P(B holds serve)
Surface-specific Elo ratings — hard/clay/grass/indoor hard treated as separate dimensions, not adjustments (200+ Elo point swings between surfaces)
Best-of-3 vs best-of-5 format changes win probability distributions fundamentally (60% per-set favorite → 65% BO3, 68% BO5)
5 edge scanners required — Match Winner, Game Spread, Total Games, Exact Set Score, Tournament Outright
Sackmann tennis data as primary free source (Jeff Sackmann's GitHub — match results, serve stats, Elo ratings)
Injury/withdrawal intelligence is critical — tennis has routine mid-match retirements unlike team sports
Pinnacle as sharp anchor — de-vig for true probabilities, compare to Kalshi
Draw path simulation for outrights — bracket-based Monte Carlo, not simple Elo head-to-head
Fatigue and scheduling context — back-to-back matches, timezone travel, surface transitions
Motivation quantification — ranking points to defend, Grand Slam seeding cutoffs, year-end finals qualification

Where Advisors Disagreed

Simulation approach: Most recommended Monte Carlo (10K-100K sims). Reviewers noted tennis is a solvable Markov Process — closed-form solutions exist for match/set/game probabilities. Council verdict: Markov chain for pre-match pricing (exact, fast), Monte Carlo reserved for tournament outrights (bracket pathing).
WTA vs ATP modeling: Most advisors focused on ATP. Sonnet explicitly noted WTA has fundamentally different hold rates and variance. Council verdict: Separate model calibration for ATP and WTA — different hold rate distributions, break frequencies, and retirement patterns.
Data sources: Gemini was generic, gpt-oss recommended enterprise stack (Kafka, Snowflake, Kubernetes). Council verdict: Sackmann data (free), Pinnacle odds, targeted web scraping for injury signals. SQLite WAL.
Press conference sentiment: Some advisors proposed NLP on press conferences. Council verdict: Low-signal, high-noise — defer. Focus on quantitative signals first.

Strongest Arguments (from peer review)

Sonnet wins near-unanimously — 4 of 5 reviewers selected Sonnet as strongest:

Opened with "why tennis is structurally different" — demonstrated genuine conceptual understanding before designing
Serve-hold probability as atomic unit explicitly stated as design principle (not afterthought)
Complete SQLite schema with exact column types (players, matches, match_serve_stats, player_surface_stats, h2h_records, tournaments, draws, player_status)
Surface-specific Elo as separate dimension (hard/clay/grass/indoor hard)
Grand Slam fifth-set rule differences (Wimbledon 12-12 tiebreak vs US Open supertiebreak vs French Open no-tiebreak)
Sackmann data gap awareness for Challenger serve stats (widen confidence intervals)
Calibration as core design requirement, not afterthought
1,000,000 simulation count (highest specified)
Phased build plan with realistic scope

Biggest Blind Spot

Gemini: No data pipeline, no database schema, no data sources, no ingestion schedule. Pure strategy with zero implementation. Also naive fatigue modeling (linear weighted sum) and overconfidence in low-signal data (press conference NLP).

What Everyone Missed (from peer reviews)

Markov chains over Monte Carlo for pre-match — Tennis has discrete, reset states (love-all, 15-all, tiebreaker). Closed-form Markov chain matrices give exact probabilities in milliseconds. Monte Carlo adds unnecessary variance for pre-match pricing.
Bookmaker retirement rules differ — "1-Ball" (match stands after first serve), "1-Set" (stands after one set), "Full Match" (voided). Injury intelligence is useless without routing to correct bookmaker rules.
Match-fixing/integrity flags for lower tiers — Challenger and ITF levels have rampant fixing. Need automatic "Integrity Kill-Switch" when Pinnacle moves 40+ cents against model prediction on lower-tier matches.
Kalshi line availability lag — Kalshi may not list tennis matches until hours before. Pipeline must handle late market appearance.
In-play Bayesian updating — After each game, update serve-hold estimates from observed first-serve % and break point conversion. No advisor designed a live data integration layer.

BUILD PLAN

Phase 1: Core Tennis Data Tables

tennis_players: player_id, full_name, preferred_name, nationality, hand (L/R), height_cm, current_rank, peak_rank, tour (ATP/WTA/Challenger), status, coach, elo_overall, elo_hard, elo_clay, elo_grass, elo_indoorhard, injury_flag, injury_notes

tennis_matches: match_id, tournament_id, match_date, round, surface, indoor_outdoor, player1_id, player2_id, winner_id, score, sets_played, games_p1, games_p2, retirement, walkover, duration_minutes, session (day/night), format (BO3/BO5), source

tennis_serve_stats: stat_id, match_id, player_id, aces, double_faults, first_serve_pct, first_serve_won, second_serve_won, bp_faced, bp_saved, service_hold_pct, return_bp_won_pct, total_points_won

tennis_player_surface_stats: player_id, surface, window_days (90/180/365/career), matches_played, win_pct, avg_hold_pct, avg_break_pct, avg_games_per_set, elo_on_surface

tennis_h2h_records: player1_id, player2_id, surface, matches_played, p1_wins, p2_wins, p1_avg_hold, p2_avg_hold

tennis_tournaments: tournament_id, name, surface, indoor_outdoor, level (Grand Slam/Masters/500/250/Challenger), draw_size, sets_format (BO3/BO5), country, city, start_date, end_date, points_winner

tennis_player_status: player_id, date, status_type (injury/fatigue/motivation/form), signal_source, signal_tier (hard/soft/contextual), severity (1-10), notes, confidence

tennis_draws: tournament_id, round, position, player_id, seed, projected_opponent_ids (JSON)

tennis_referee_data: umpire_id, name, avg_time_violations, overrule_rate, code_violations_per_match

Phase 2: Distribution Models

Core engine: Point-win probability → Game-win probability → Set-win probability → Match-win probability

Level	Method	Parameters
Point	Logistic regression	Surface Elo, serve stats, fatigue, weather, altitude
Game	Markov chain (exact)	P(server wins point on serve) — nonlinear: 65% point-win ≈ 83% game-win
Set	Markov chain (exact)	P(hold) per player, tiebreak rules per tournament
Match	Markov chain (exact for BO3/BO5)	Set-win probabilities, fifth-set rule variants
Outright	Monte Carlo (1M bracket sims)	Draw path, surface Elo, fatigue accumulation through rounds

All 5 markets derived from same underlying serve-hold probabilities.

Phase 3: 5 Edge Scanners

Common engine:

Ingest Pinnacle odds (all markets)
De-vig using Shin method (2-way for match winner, multi-way for set scores)
Build Markov chain probability matrices from serve-hold estimates
Compare to Kalshi contract prices
Min edge: 4 cents after Kalshi 7% fee
Min sample: 10+ matches on current surface for both players
Integrity check: flag if Pinnacle line moves 40+ cents against model on Challenger/ITF
Output: {match_id, market, selection, model_prob, kalshi_price, edge, confidence, surface, tier}

Per-market unique logic:

Scanner	Unique Logic
Match Winner	Markov chain from serve-hold; surface Elo primary driver; retirement risk discount
Game Spread	Game distribution from match simulation; weather/altitude adjust hold rates
Total Games	Combined games distribution; shorter matches on fast surfaces; BO3 vs BO5 total games range
Exact Set Score	Full set-by-set probability matrix; tiebreak rules per tournament; BO5 has 10 possible scores
Tournament Outright	1M bracket Monte Carlo; draw path dependency; fatigue accumulation; surface transitions within tournament

Phase 4: Matchup Card Format

MATCH: [Player A] vs [Player B] | [Tournament] [Round] | [Date]
SURFACE: [hard/clay/grass] | [indoor/outdoor] | FORMAT: [BO3/BO5]
COURT: [Name] | SESSION: [Day/Night] | WEATHER: [Temp/Wind/Humidity]

PLAYER A: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
  Surface Record (12mo): [W-L] ([win%]) | Hold Rate: [%] | Break Rate: [%]
  Form (Last 5): [W/D/L sequence] | Last Match: [Date, Opponent, Score]
  Serve Stats: 1st Serve [%] | 1st Serve Won [%] | 2nd Serve Won [%]
  Aces/G: [avg] | DF/G: [avg] | BP Save Rate: [%]
  vs [L/R]-handed: Win Rate [%] | Hold Rate [%]

PLAYER B: [Name] ([L/R]) | Rank: [#] | Elo Overall: [val] | Elo [Surface]: [val]
  [Same fields as Player A]

H2H (All surfaces): [A wins]-[B wins]
H2H ([Surface]): [A wins]-[B wins] | A Hold: [%] | B Hold: [%]

STATUS:
  Player A: [Injury/fatigue/motivation flags with severity and source tier]
  Player B: [Same]

SCHEDULING:
  Player A: Days Since Last Match: [n] | Previous Surface: [val] | Timezone Travel: [Y/N]
  Player B: [Same]

MOTIVATION:
  Player A: Defending [X pts] | Ranking Impact: [rise/fall to #Y]
  Player B: [Same]

DRAW CONTEXT:
  Next Round Likely Opponent: [Name (Rank/Seed)]
  Quarter Contains: [Top seeds in section]

INTELLIGENCE:
  [CRITICAL/MODERATE/CONTEXT findings with signal tier]

Phase 5: Dashboard

Daily match board: All matches across ATP/WTA/Challenger with edge counts, surface, tier
Match drill-down: Full matchup card + 5 market edges + serve stat comparison + H2H
Surface form tables: Player rankings by surface with Elo, hold rates, win rates
Injury/status board: All flagged players with signal tier and severity
Tournament tracker: Draw brackets with model win probabilities per player
Edge alerts: Sorted by magnitude, filterable by market type, surface, tier
Integrity monitor: Challenger/ITF matches with suspicious line movements
P&L tracker: By market type, by surface, by tier, Brier scores
Calibration dashboard: Rolling accuracy by market, surface, and confidence band

OPEN QUESTIONS FOR BOSS RULING

Tour scope: ATP + WTA + Grand Slams + Challengers? Or ATP + WTA main tour only to start?
Markov chain vs Monte Carlo: Council recommends Markov chain for pre-match (exact, fast) and Monte Carlo only for outrights. Confirm?
Sackmann data scraping: Free GitHub data but needs regular pulling and parsing. Build automated pipeline?
WTA separate calibration: Different hold rates, variance, retirement patterns. Build separate WTA model parameters?
Tournament outright scanner: High variance, complex bracket simulation. Build now or defer?
Integrity kill-switch: Auto-block Challenger/ITF bets when Pinnacle moves 40+ cents against model? Or flag for manual review?
Indoor hard court Elo: Separate from outdoor hard? Adds complexity but captures real differences (faster courts, no weather).

COUNCIL METADATA

Detail	Value
Council date	2026-04-01
Advisory responses	5 (all completed)
Peer reviews	5 (all completed)
Strongest advisor	Sonnet (4/5 votes — near-unanimous)
Runner-up	Opus (1/5 vote from Grok)
Biggest blind spot	Gemini (no implementation, generic)
Full council data	`/home/ubuntu/edgeclaw/data/councils/2026-04-01/tennis-research/`

Source: ~/edgeclaw/results/panel-results/tennis-research-ruling.md