MLB Data Audit — Council Ruling
Date: 2026-04-01
Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling)
Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b
Winner: gpt-oss (3 of 5 peer review votes)
Status: PENDING BOSS RULING on open questions
COUNCIL SUMMARY
Where Advisors Agreed
- Starting pitcher is the #1 data input — SP quality metrics (ERA, FIP, xFIP, WHIP, K/9, BB/9) are foundational
- Bullpen availability tracking is #2 — pitch counts, days rest, multi-day workload all must be tracked daily
- Weather data from NWS API — critical for outdoor parks, affects totals significantly
- Park factors per stadium — Coors (extreme), Yankee Stadium (short porch), Oracle Park (marine layer)
- Platoon splits (L/R matchups) — 30-50 point wOBA swings at team level
- EWMA with stat-specific decay rates — different alphas for SP metrics vs team batting vs bullpen
- 4 separate edge scanners — Moneyline, Run Line, Totals, First 5 Innings
- Poisson/Negative Binomial for run distributions — vanilla Poisson insufficient due to overdispersion
- First-time-through-order (FTTO) splits essential for F5 market
- Early-season protocol — limited SP sample in April requires blending with projections
Where Advisors Disagreed
- Database engine: gpt-oss recommended PostgreSQL + TimescaleDB with 20+ tables. Opus used SQLite with 14 tables. Council verdict: SQLite WAL for current scale, design migration-ready.
- Distribution model: gpt-oss used Zero-Inflated Negative Binomial, Opus used Poisson with walk-off correction, Gemini used basic Poisson. Council verdict: Negative Binomial preferred for run scoring (overdispersion), with walk-off correction for run line.
- BvP (batter vs pitcher) data: Opus explicitly disqualified individual BvP due to small samples. Others included it with caveats. Council verdict: Drop individual BvP, use platoon splits at team level instead.
- Weather source: Grok recommended OpenWeatherMap, Opus specified NWS API only. Council verdict: NWS API only (free, reliable, already established policy).
Strongest Arguments (from peer review)
gpt-oss wins with the most complete data architecture design:
- End-to-end data flow: raw → clean → gold layer with audit trail
- 20+ tables with PK/FK, explicit data types, and change-history tables
- Row-level checksums on daily pulls, staleness alarms, schema drift detection
- Automated data quality pipeline with alerting
- Four ready-to-run edge scanner blueprints per market type
- Distribution model justification with empirical variance-to-mean ratios
- Exact API endpoints, pagination limits, rate-limit handling, fallback plans
Opus runner-up with deepest baseball analytics knowledge:
- Walk-off truncation problem for run line fully worked through
- Humidity physics (humid air LESS dense = ball carries farther)
- Bullpen 4-tier system with specific pitch-count thresholds
- Dual-EWMA crossover system (fast vs slow signal)
- Opener/bullpen game detection (unique gap identification)
- Self-assessment of own weaknesses
Biggest Blind Spot
Gemini: Skeleton schema (4 tables, no indexes, no constraints), recommended wrong weather API (OpenWeatherMap instead of NWS), no formulas for distribution parameters, vague source references without API specs.
What Everyone Missed (from peer reviews)
- Real audit pipeline vs data feeds — All advisors designed data collection but none built proper data quality observability: freshness SLAs, source reconciliation, anomaly detection, data lineage, reproducibility.
- Market data integration layer — Real-time odds ingestion, line movement tracking, de-vigging architecture, steam/reverse-edge alerts.
- P&L attribution per data input — No way to measure whether SPQC, bullpen availability, or weather adjustments are actually profitable over time.
- Lineup delta detection — Star position player rest days happen daily; need automated parser comparing official vs projected lineup.
- Kalshi-specific liquidity constraints — Thin exchange, position sizing must account for market impact.
BUILD PLAN
Phase 1: Core MLB Data Tables
mlb_sp_game_logs:
- pitcher_id, pitcher_name, date, game_id, team, opponent
- innings_pitched, earned_runs, hits_allowed, walks, strikeouts, pitches_thrown
- game_score, era_after, fip_after, xfip_after, whip_after
- k_per_9, bb_per_9, hr_per_9, gb_rate
- ftto_woba (first time through order), ftto_k_rate, ftto_bb_rate
- Source: MLB Stats API + Baseball Savant
mlb_sp_baselines:
- pitcher_id, date, stat_type
- last_3_starts, last_5_starts, last_10_starts, season_avg
- ewma_015 (fast, SP form), ewma_010 (standard), ewma_005 (slow, bullpen ERA)
- steamer_projection, zips_projection (preseason/updating)
- games_started, season_ip
- early_season_flag (boolean — fewer than 5 starts)
mlb_team_batting:
- team, date, opponent_sp_hand (L/R)
- team_woba, team_ops, team_wrc_plus
- vs_lhp_woba, vs_rhp_woba (platoon splits)
- home_woba, away_woba
- last_7_woba, last_14_woba, last_30_woba
- iso_power, k_rate, bb_rate, barrel_rate
mlb_bullpen_status:
- team, date, pitcher_id, pitcher_name, role (closer/setup/middle/long/opener)
- availability_tier (GREEN/YELLOW/RED/BLACK)
- yesterday_pitches, two_days_pitches, three_days_pitches
- appearances_last_3d, appearances_last_7d, pitches_last_7d
- high_leverage_innings_last_7d
- criteria: BLACK=unavailable, RED=30+ pitches yesterday OR 3 of last 4 days, YELLOW=20-29 yesterday, GREEN=available
mlb_game_weather:
- game_id, date, park_id
- temperature_f, humidity_pct, wind_speed_mph
- wind_direction_relative (OUT_TO_CF/IN_FROM_CF/CROSSWIND_LR/CROSSWIND_RL/CALM)
- wind_run_impact (park-specific multiplier × wind component)
- precip_probability, precip_type
- air_density_adjustment (altitude + humidity + temp)
- roof_status (open/closed/retractable_open/retractable_closed/dome)
- Source: NWS API only
mlb_park_factors:
- park_id, park_name, team, season
- runs_factor, hr_factor_lhb, hr_factor_rhb, hits_factor
- dimensions_lf, dimensions_cf, dimensions_rf
- altitude_ft, roof_type
- Source: FanGraphs park factors
mlb_umpire_data:
- umpire_id, umpire_name, date, game_id
- career_k_above_avg, career_bb_above_avg, career_runs_above_avg
- season_k_rate, season_bb_rate
- abs_challenge_rate, abs_overturn_rate (2026 new)
- zone_size_index (relative to league average)
mlb_lineups:
- game_id, date, team, batting_order (1-9)
- player_id, player_name, position
- confirmed (boolean), source, timestamp
- season_wrc_plus, vs_hand_wrc_plus
- lineup_total_wrc_plus, projected_total_wrc_plus
- delta_wrc_plus (flags rest days)
Phase 2: Derived Metrics
| Metric |
Formula |
Purpose |
| SP Quality Composite (SPQC) |
Weighted: 0.3×xFIP + 0.3×FIP + 0.2×ERA + 0.2×EWMA_GS |
Single SP quality number |
| Bullpen Availability Index (BAI) |
Weighted avg of available arms × role importance |
Team bullpen readiness score |
| Weather Run Factor (WRF) |
wind_component × park_multiplier + temp_adj + humidity_adj + altitude_adj |
Total weather impact on runs |
| Platoon Advantage Score |
Team wOBA vs SP hand − team season wOBA |
Measures platoon edge |
| FTTO Decay Rate |
SP's innings 1-3 wOBA vs innings 4-5 wOBA |
How much SP degrades through order |
| Day-Night Fatigue |
Team batting stats in day-after-night games vs baseline |
Quantified fatigue effect |
| Lineup Strength Delta |
Actual lineup wRC+ − projected lineup wRC+ |
Detects star rest days |
Phase 3: Distribution Models Per Market
| Market |
Distribution |
Parameters |
Notes |
| Moneyline |
Negative Binomial (each team's runs) |
μ from SPQC × batting × park × weather; k from team variance |
Win prob = P(runs_home > runs_away) |
| Run Line (-1.5) |
Negative Binomial with walk-off correction |
Same μ, k + home walk-off truncation |
Home teams don't bat bottom 9th if leading → reduces home -1.5 cover prob |
| Totals |
Negative Binomial (combined runs) |
μ_total = μ_home + μ_away; adjusted for weather, park, bullpen |
Over/under probability at each threshold |
| First 5 Innings |
Modified NB (SP-only, no bullpen) |
μ from FTTO splits × batting vs SP hand × park; NO bullpen component |
Isolates SP — use innings 1-5 specific rates only |
Phase 4: Edge Scanners (4 scanners)
Common engine:
- Ingest Pinnacle odds for all 4 markets
- De-vig using Shin + Power methods
- Build NB probability curves with all adjustments
- Compare to Kalshi contract prices
- Min edge: 4 cents after Kalshi 7% fee
- Min sample: SP must have 5+ starts this season (early-season gate)
- Output:
{game_id, market_type, side, model_prob, kalshi_price, edge, confidence, sp_status, weather_flag}
Per-market unique logic:
| Scanner |
Unique Logic |
| Moneyline |
SP quality is primary driver, bullpen quality secondary, weather minimal impact |
| Run Line |
Walk-off correction for home favorites, bullpen quality MORE important (late-game leverage) |
| Totals |
Weather is PRIMARY driver (wind × park × temp × humidity), bullpen quality important, umpire zone |
| First 5 |
SP-only — FTTO splits, umpire zone, NO bullpen factor, weather less impactful (fewer innings) |
Phase 5: Matchup Card Format
GAME: [Away] @ [Home] | [Date] [Time ET] | [Park]
WEATHER: [Temp]°F | Wind: [Speed]mph [Direction] | Humidity: [%] | WRF: [+/-runs]
ROOF: [Status] | PARK: Runs [factor] | HR-L [factor] | HR-R [factor]
UMPIRE: [Name] | K+[adj] | BB+[adj] | R+[adj] | ABS Overturn: [rate]
HOME SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
SPQC: [composite] | xFIP: [val] | FIP: [val] | ERA: [val] | WHIP: [val]
K/9: [val] | BB/9: [val] | HR/9: [val] | GB%: [val]
FTTO: wOBA [val] | K% [val] (innings 1-3 vs 4-5)
Last 3 Starts: [date, opp, IP, ER, K, pitches] × 3
Days Rest: [n] | Season IP: [total] | Trend: [up/stable/down]
AWAY SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
[Same fields]
HOME BATTING vs [Away SP Hand]:
Team wOBA: [season] | vs [L/R]HP: [platoon] | Platoon Advantage: [+/- pts]
Last 7 wOBA: [val] | Barrel Rate: [val] | K Rate: [val]
Lineup wRC+: [total] | Delta from Projected: [+/-]
Key Rest Day: [player name if delta > 15 wRC+]
AWAY BATTING vs [Home SP Hand]:
[Same fields]
HOME BULLPEN: [GREEN/YELLOW/RED/BLACK]
BAI: [score] | Closer: [Name]-[status] | Setup: [Names]-[status]
Team Pitches Last 3 Days: [total] | High-Leverage Available: [Y/N]
AWAY BULLPEN: [Status]
[Same fields]
SCHEDULING:
Day Game After Night Game: [Home Y/N] [Away Y/N]
Series Game: [1/2/3/4] | Travel: [arrived yesterday/same city/off day]
INTELLIGENCE:
[Findings tagged CRITICAL/MODERATE/CONTEXT]
Phase 6: Dashboard
- Daily slate: All games with SP status, weather flags, bullpen status, edge counts per market
- Game drill-down: Full matchup card + all 4 market edges + research findings + lineup delta
- SP tracker: All 30 teams' probable pitchers with status color coding and SPQC rankings
- Bullpen board: Team-by-team availability tiers, pitch counts, trending fatigue
- Weather map: Outdoor parks with wind/temp/precip impact visualization
- Lineup monitor: Delta alerts when actual lineup deviates from projected
- Edge alerts: Sorted by magnitude, filterable by market type, with staleness timestamps
- P&L tracker: Performance by market type, by edge bucket, Brier scores
- Data quality dashboard: Source freshness, pull failures, staleness alarms, schema drift alerts
OPEN QUESTIONS FOR BOSS RULING
Walk-off correction for Run Line: Opus identified that standard Poisson/NB overstates home -1.5 cover probability because home teams stop batting when leading. Should we implement a correction formula now, or build full inning-by-inning Markov simulation later?
Early-season protocol: Recommended blending Steamer/ZiPS projections with actual stats in April-May (weighted 70/30 projections/actual with 3 starts, shifting to 30/70 by 10 starts). Confirm?
Individual BvP data: Council says drop it (sample size too small to be reliable). Use platoon splits at team level instead. Confirm?
Data history depth: How many seasons of SP game logs? 2 seasons? 3 seasons?
Umpire zone impact: Track ABS Challenge System data starting 2026 but treat as CONTEXT only through May, actionable June+. Confirm?
Opener/bullpen game detection: Should the system auto-detect when a team announces an "opener" (1-2 inning starter followed by bulk reliever) and treat it differently from a traditional start?
COUNCIL METADATA
| Detail |
Value |
| Council date |
2026-04-01 |
| Advisory responses |
5 (all completed) |
| Peer reviews |
5 (all completed) |
| Strongest advisor |
gpt-oss (3/5 votes) |
| Runner-up |
Opus (2/5 votes) |
| Biggest blind spot |
Gemini (2/5 votes) |
| Full council data |
/home/ubuntu/edgeclaw/data/councils/2026-04-01/mlb-data-audit/ |
Source: ~/edgeclaw/results/panel-results/mlb-data-audit-ruling.md