Golf Data Audit — Council Ruling
Date: 2026-04-01
Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling)
Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b
Winner: Grok (2 of 5 peer review votes — most split council, 4 of 5 self-voted)
Status: PENDING BOSS RULING on open questions
COUNCIL SUMMARY
Where Advisors Agreed
- Strokes Gained decomposition is mandatory — SG:OTT, SG:APP, SG:ARG, SG:PUTT, SG:T2G, SG:Total
- DataGolf as primary SG data source (PGA Tour, DP World Tour, Korn Ferry)
- Course fit via regression — player SG vector × course demand vector
- EWMA for SG baselines — 24-round half-life for T2G, 40-round for putting (per Broadie)
- Weather wave advantage is #1 short-term edge — NWS API for US events
- Monte Carlo tournament simulation — 50K-100K iterations for outright/top-N probabilities
- 7 edge scanners — Outright, H2H, Make Cut, Top 5/10/20, Round Leader, 3-Ball, Hole in One
- Grass type affects putting dramatically — Bermuda/Bentgrass/Poa annua require separate SG:PUTT tracking
- Top 20 market is laziest-priced — known inefficiency worth exploiting
- LIV requires different modeling — 54 holes, no cut, shotgun starts, 48-player field
Where Advisors Disagreed
- API specificity: Opus provided actual API endpoints (statdata.pgatour.com, datagolf.com/api), others cited sources generically. Council verdict: Need exact endpoints for implementation.
- LPGA data: gpt-oss claimed "LPGA ShotLink" exists (it doesn't in public form). Opus honestly noted DataGolf doesn't cover LPGA and provided proxy SG formulas. Council verdict: LPGA needs proxy formulas, not phantom data sources.
- Course clustering approach: Some used k-means, others regression weights, one used manual categories. Council verdict: Regression-derived course DNA vectors (not manual clustering) — data-driven.
- Edge thresholds: Range from 2% to 5% per market. Council verdict: Market-specific — 2% outrights (highest variance), 3% H2H, 4% 3-ball (most concentrated).
Strongest Arguments (from peer review)
Grok wins with the most focused, implementable design:
- Clean scanner design with specific inputs per market type
- Proper course clustering methodology
- Calibration table structure included
- Automated refresh triggers specified
- Stays focused on data audit without overbuilding
Opus runner-up with deepest operational specifics:
- Actual API endpoints and URLs (not just source names)
- LPGA data honesty (admitted gaps, provided proxy formulas)
- Market-specific edge thresholds with justification (vig structure, sharpness)
- Kelly fractions per market type
- Multi-analyst consensus card format with edge flags
Biggest Blind Spot
No backtesting or calibration framework — All advisors build models but none address how to validate them. No historical backtesting, calibration curves, Brier scores, sample size requirements, or closing line value (CLV) tracking.
What Everyone Missed (from peer reviews)
- Withdrawal risk pricing — Golf fields have 10-25% WD rates between initial odds and Thursday tee-off. H2H/3-ball void on WD. Outright markets shift when stars withdraw. "Dead money" from withdrawn players is exploitable mispricing. Need WD probability model.
- Information-timing framework — Monday/Tuesday practice round reports, fitness tests, late alternate additions create a window where model has genuine info edge over static book lines. No advisor built timing-specific intelligence cadence.
- Field composition uncertainty — MC simulations assume fixed field, but field changes daily Sunday through Thursday. Need to simulate field uncertainty, not just player performance.
- Pin position daily impact — Pin placements change hole difficulty by 0.5-1.0 strokes per hole. Not captured in any lookback.
- Sponsor exemptions and Monday qualifiers — Late field additions have different SG profiles. Pipeline must handle.
BUILD PLAN
Phase 1: Core Golf Data Tables
golf_players: player_id, full_name, tour (PGA/LIV/DPWT/KF/LPGA), nationality, age, owgr_rank, datagolf_rank, sg_total, sg_ott, sg_app, sg_arg, sg_putt, sg_t2g, player_sigma, active, updated_at
golf_player_sg_baselines: player_id, date, sg_component (OTT/APP/ARG/PUTT/T2G/Total), window (8rd/24rd/40rd/100rd), value, ewma_value, rounds_in_window
golf_courses: course_id, name, city, state_country, par, yardage, grass_fairway, grass_green (bermuda/bentgrass/poa), altitude_ft, roof_type, dna_ott, dna_app, dna_arg, dna_putt, avg_winning_score, avg_cut_line, course_difficulty_index
golf_tournaments: tournament_id, name, course_id, tour, start_date, purse, field_size, cut_rule (top-65/70/no-cut), format (stroke/shotgun), major (boolean), num_rounds (72/54)
golf_field_lists: tournament_id, player_id, entry_type (committed/alternate/MQ/sponsor), wd_status, wd_timestamp, wave_r1 (AM/PM), wave_r2 (PM/AM), tee_time_r1, tee_time_r2, made_cut, final_position, final_score
golf_player_course_history: player_id, course_id, appearances, rounds_played, avg_sg_total, best_finish, cuts_made, cuts_missed, wins, avg_score_to_par
golf_weather: tournament_id, round_number, wave (AM/PM), forecast_timestamp, temp_f, wind_speed_mph, wind_gust_mph, wind_direction, precip_prob, humidity_pct, wave_advantage_strokes
golf_round_scores: tournament_id, round_number, player_id, score_to_par, sg_total, sg_ott, sg_app, sg_arg, sg_putt, tee_time, wave, position_after_round, score_detail (JSON birdie/bogey/par per hole)
golf_putting_surface_splits: player_id, grass_type (bermuda/bentgrass/poa), rounds_played, sg_putt_avg, sg_putt_ewma, putt_make_pct_5to10ft, putt_make_pct_10to20ft
Phase 2: Custom Metrics
| Metric |
Formula |
Purpose |
| SG Composite (EWMA) |
T2G: 24-round half-life; Putting: 40-round half-life (per Broadie) |
Weighted player strength |
| Course Fit Score |
dot(player_SG_vector, course_DNA_vector), z-normalized |
Player-course compatibility |
| Course History Bonus |
3+ cuts: +0.1, T10: +0.15, Win: +0.20, cap +0.3 SG:Total |
Venue familiarity |
| Weather Wave Advantage |
(PM_wind_penalty - AM_wind_penalty) in strokes |
Short-term edge signal |
| Player Sigma |
Std dev of SG:Total over last 40 rounds |
Consistency measure |
| Cut Probability |
From MC: P(player in top-N after 36 holes) with course-adjusted SG |
Make-cut market pricing |
| Field Strength Index |
Sum of top-30 SG:Total in field / baseline |
Tournament difficulty |
| Putting Surface Adjustment |
SG:Putt_on_grass_type - SG:Putt_overall |
Grass-type-specific correction |
| Form Trend |
Slope of SG:Total over last 8 rounds (positive = improving) |
Momentum signal |
| WD Probability |
Logistic: age, injury flag, recent WD history, travel distance |
Withdrawal risk pricing |
Phase 3: Distribution Models
- Round score per player: Normal distribution, μ = course-adjusted SG:Total, σ = player sigma
- Tournament simulation: 100K iterations, 4 rounds (or 3 for LIV), cut after R2
- Wave weather adjustment: Applied per-round based on wave assignment
- Putting surface adjustment: Added to SG:PUTT based on grass type
- Cut line estimation: From MC: median of 36-hole cutoff positions across simulations
Phase 4: 7 Edge Scanners
| Scanner |
Min Edge |
De-vig Method |
Unique Logic |
| Outright (150-way) |
2% |
Power method |
Course fit + weather + form; Kelly 0.25x |
| H2H |
3% |
Multiplicative 2-way |
Wave differential if different waves; WD void risk; Kelly 0.5x |
| Make the Cut |
3% |
Multiplicative 2-way |
Consistency (low sigma) > peak SG; course history weight; Kelly 0.4x |
| Top 5/10/20 |
3% |
Power method (20-way) |
Top 20 laziest-priced; Kelly 0.35x |
| Round Leader |
4% |
Power method |
Wave weather DOMINANT; single-round sim; Kelly 0.3x |
| 3-Ball |
4% |
Power method (3-way) |
Same-group shared conditions; highest correlation; Kelly 0.4x |
| Hole in One |
5% |
Multiplicative 2-way |
Par-3 difficulty × field size × ace rate; high variance; Kelly 0.15x |
Phase 5: Dashboard
- Tournament board: Field with SG composites, course fit, waves, edge counts
- Player drill-down: Full card + all market edges + SG trends + course history
- Weather center: Wave advantage, forecast timeline, re-pricing alerts
- Cut line tracker: Projected cut with make-cut probabilities
- H2H board: All matchups with edges and wave differentials
- WD monitor: Players with elevated withdrawal probability
- Edge alerts: By magnitude, filterable by market type
- P&L tracker: By market type, tournament, edge bucket, Brier scores
OPEN QUESTIONS FOR BOSS RULING
- DataGolf subscription: Required for comprehensive SG data across PGA/DPWT/KF. Cost?
- LPGA coverage: No reliable SG source. Build proxy formulas or skip LPGA?
- LIV-specific model: 54 holes, no cut, shotgun starts, 48-player field. Worth separate build?
- WD probability model: Build logistic regression for withdrawal risk pricing?
- Historical depth: How many seasons of SG data to backfill?
- Putting surface splits: Track SG:Putting separately by grass type (bermuda/bentgrass/poa)?
- Hole in One scanner: Very high variance novelty market. Build or skip?
COUNCIL METADATA
| Detail |
Value |
| Council date |
2026-04-01 |
| Advisory responses |
5 (all completed) |
| Peer reviews |
5 (all completed) |
| Strongest advisor |
Grok (2/5 votes — 1 genuine cross-vote) |
| Runner-up |
Opus (1/5 self-vote, but strongest review insights) |
| Biggest blind spot |
No backtesting/calibration framework |
| Full council data |
/home/ubuntu/edgeclaw/data/councils/2026-04-01/golf-data-audit/ |
Source: ~/edgeclaw/results/panel-results/golf-data-audit-ruling.md