MLB Research Pipeline — Council Ruling
Date: 2026-04-01
Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling)
Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b
Winner: Opus (2 of 5 peer review votes; gpt-oss got 2, Gemini got 1)
Status: PENDING BOSS RULING on open questions
COUNCIL SUMMARY
Where Advisors Agreed
- Starting pitcher is THE #1 variable — controls 55-70% of game outcome variance, more extreme than NHL goalies
- 4 separate edge scanners required — Moneyline, Run Line (-1.5), Totals (O/U), First 5 Innings (F5)
- SP confirmation workflow with credibility hierarchy — team official > beat reporter > fantasy aggregator > social media
- Bullpen availability is the #2 daily variable — pitch counts, days rest, and multi-day workload all tracked
- Weather is a quantifiable model input, not just context — wind speed/direction, temp, humidity, park-specific multipliers
- Park factors per stadium — Coors (extreme), Yankee Stadium (short porch RF), Oracle Park (marine layer), etc.
- Platoon splits (L/R matchups) create 30-50 point wOBA swings at team level
- 3-pass research schedule — morning (SP confirmation + weather), afternoon (lineups + bullpen), pre-game (final confirmation)
- Poisson distribution for run scoring with park/weather adjustments
- Matchup cards must include SP stats, bullpen availability, weather, park factors, and umpire assignment
Where Advisors Disagreed
- F5 distribution model: Some advisors used same Poisson as full game, Opus correctly identified that first-time-through-order (FTTO) advantage makes innings 1-3 systematically different from 4-5, requiring separate parameters. Council verdict: F5 needs SP-specific FTTO splits applied to Poisson parameters.
- ABS Challenge System readiness: Some advisors treated it as immediately actionable edge. Opus flagged that April-May 2026 is data collection only — need 200+ challenges before patterns emerge. Council verdict: Flag as CONTEXT only through May, actionable June+ with minimum sample.
- Which models run research: Gemini proposed using blind analyst models (Sonnet, Gemini) for research, creating contamination. Council verdict: Research models must be SEPARATE from blind analyst models — use Grok 4.1 Fast for search, DeepSeek R1 for extraction.
- Database engine: gpt-oss recommends PostgreSQL + TimescaleDB. Others assume SQLite. Council verdict: SQLite WAL for current scale, design tables to be migration-ready.
Strongest Arguments (from peer review)
Opus wins with the most production-ready and analytically deep design:
- 4-tier bullpen availability system (GREEN/YELLOW/RED/BLACK) with specific pitch-count thresholds
- Park-specific wind sensitivity multipliers (Wrigley 1.5x, Coors 0.8x, Oracle Park 1.2x, domes 0.0x)
- Correctly noted humid air is LESS dense than dry air (physics error most models make)
- Early-season caution mode: widen confidence intervals 20%, raise edge thresholds 1.5% in April
- Per-game kill switch taxonomy (4 levels: per-game, per-market, weather-triggered, daily)
- ABS Challenge phased rollout (collect → 200 sample → actionable)
- Wind direction encoded as field-relative (OUT_TO_CF / IN_FROM_CF / CROSSWIND_LR)
Biggest Blind Spot
Gemini: Proposed using blind analyst models for research (contamination risk), thin database schema, generic search queries, and most importantly — assigned the wrong models to research roles. Also weakest on edge scanner math specifics.
What Everyone Missed (from peer reviews)
- Intelligence calibration loop — No advisor designed a feedback system to measure which IntelAdjustment types are value-additive vs. noise. Need: retrospective tagging against outcomes, dynamic source credibility weights, A/B testing of research prompts.
- Daily lineup delta detection — SP scratches are rare; star position player rest days happen EVERY day. Need automated parser comparing official 9-man lineup vs projected lineup, flagging large wRC+ deltas.
- Weather void rules by market — Rain-shortened games: ML and F5 are graded, but Run Line and Totals may be voided. Pipeline should restrict RL/Totals exposure in high-rain environments.
- Kalshi exchange mechanics — Thin liquidity on MLB props, bid-ask spread cost, price staleness detection.
- Home team walk-off impact on Run Line — Home team doesn't bat bottom 9th if leading. This fundamentally alters -1.5 run line probability for home favorites vs away favorites.
BUILD PLAN
Phase 1: MLB Game Data Tables
mlb_starting_pitchers:
- game_id, date, team, pitcher_id, pitcher_name
- status (TBD/probable/confirmed/scratched)
- status_source, status_timestamp, prev_status
- hand (L/R), season_era, season_fip, season_xfip, season_whip
- k_per_9, bb_per_9, hr_per_9, gb_rate
- ftto_woba (first time through order), ftto_k_rate
- last_start_date, last_start_pitches, days_rest
- season_ip, pitch_count_trend
- cascade_fired (boolean — prevents duplicate recomputations)
mlb_bullpen_availability:
- team, date, pitcher_id, pitcher_name, role (closer/setup/middle/long)
- status (GREEN/YELLOW/RED/BLACK)
- yesterday_pitches, two_days_ago_pitches, three_days_ago_pitches
- appearances_last_7d, pitches_last_7d
- high_leverage_available (boolean)
mlb_game_environment:
- game_id, date, park_id, park_name
- roof_status (open/closed/retractable_open/retractable_closed/dome)
- temperature_f, humidity_pct, wind_speed_mph
- wind_direction_relative (OUT_TO_CF/IN_FROM_CF/CROSSWIND_LR/CROSSWIND_RL/CALM)
- wind_sensitivity_multiplier (park-specific: Wrigley 1.5, Coors 0.8, etc.)
- altitude_ft, air_density_adjustment
- precip_probability, weather_source, weather_timestamp
mlb_team_game_logs:
- team, date, game_id, opponent, home_away
- runs_scored, runs_allowed, hits, errors
- team_woba, team_ops, team_wrc_plus
- vs_lhp_woba, vs_rhp_woba (platoon splits)
mlb_lineups:
- game_id, date, team, batting_order (1-9)
- player_id, player_name, position
- confirmed (boolean), source, timestamp
- season_wrc_plus, vs_hand_wrc_plus (vs SP hand)
- lineup_wrc_plus_total (sum of 9 hitters)
- projected_wrc_plus_total (what was expected before lineup card)
- delta_wrc_plus (actual - projected, flags rest days)
mlb_umpire_assignments:
- game_id, date, umpire_id, umpire_name
- career_k_per_game_above_avg, career_bb_per_game_above_avg
- career_runs_per_game_above_avg
- season_k_rate, season_bb_rate
- abs_challenge_overturn_rate (2026 new)
mlb_park_factors:
- park_id, park_name, team
- runs_factor, hr_factor, hits_factor
- lhb_hr_factor, rhb_hr_factor (asymmetric parks)
- dimensions_lf, dimensions_cf, dimensions_rf
- altitude_ft, roof_type
Phase 2: Matchup Card Format
GAME: [Away] @ [Home] | [Date] [Time ET] | [Park]
WEATHER: [Temp]°F | Wind: [Speed]mph [Direction_Relative] | Humidity: [%] | Precip: [%]
ROOF: [Status] | PARK FACTOR: [Runs Factor] | UMPIRE: [Name] (K+[adj]/BB+[adj]/R+[adj])
HOME SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
ERA: [season] | FIP: [season] | xFIP: [season] | WHIP: [season]
K/9: [rate] | BB/9: [rate] | HR/9: [rate] | GB%: [rate]
FTTO wOBA: [rate] | FTTO K%: [rate]
Last Start: [date] vs [team] — [IP] IP, [ER] ER, [K] K, [Pitches] pitches
Days Rest: [n] | Season IP: [total] | Pitch Count Trend: [up/stable/down]
vs Opp Lineup (platoon): Team wOBA vs [L/R]HP: [rate]
AWAY SP: [Name] ([L/R]) | Status: [Confirmed/Probable/TBD]
[Same fields as above]
HOME LINEUP: [Confirmed/Projected]
Lineup wRC+: [total] | vs SP Hand wRC+: [total]
Key Hitters: [Top 3 by wRC+ with stats]
Delta: [actual vs projected wRC+ — flags rest days]
AWAY LINEUP: [Confirmed/Projected]
[Same fields as above]
HOME BULLPEN: [GREEN/YELLOW/RED]
Closer: [Name] — [Status] | Setup: [Names] — [Status]
Pitches Last 3 Days: [total team] | High-Leverage Available: [Y/N]
AWAY BULLPEN: [GREEN/YELLOW/RED]
[Same fields as above]
INTELLIGENCE:
[Research findings tagged CRITICAL/MODERATE/CONTEXT]
[SP injury concerns, lineup changes, weather alerts, ABS data]
Phase 3: Edge Scanners (4 scanners)
Common engine:
- Ingest Pinnacle odds (ML, RL, Totals, F5)
- De-vig using Shin + Power methods
- Build Poisson probability curves with adjustments
- Compare to Kalshi contract prices
- Apply minimum edge (4 cents after Kalshi 7% fee) and minimum sample gates
- Output:
{game_id, market_type, side, model_prob, kalshi_price, edge, confidence}
Per-market scanner differences:
| Scanner |
Distribution |
Key Adjustments |
Unique Logic |
| Moneyline |
Poisson (expected runs per team) |
SP quality, lineup wRC+, park factor, weather, bullpen quality, platoon splits, umpire |
Standard win probability from Poisson run differential |
| Run Line (-1.5) |
Poisson with margin threshold |
Same as ML + home team walk-off constraint (no bottom 9th if leading) |
P(win by 2+) — home favorites have structurally different RL probability than away favorites |
| Totals |
Poisson (combined expected runs) |
Weather is PRIMARY driver: wind direction/speed × park multiplier, temp adjustment (+0.15 runs per 10°F above 72°F), humidity |
Combined run total distribution, over/under probability at each threshold |
| First 5 Innings |
Modified Poisson (SP-only) |
FTTO splits, SP K rate, SP walk rate, umpire zone, NO bullpen component |
Isolates SP performance — use innings 1-5 specific rates, remove bullpen quality entirely |
Specific adjustments:
| Factor |
ML Impact |
RL Impact |
Totals Impact |
F5 Impact |
| SP change (starter→TBD) |
Full recompute |
Full recompute |
Full recompute |
Full recompute |
| SP change (starter→worse SP) |
Adjust expected runs |
Same + walk-off recalc |
Adjust both sides |
Full recompute |
| Weather: wind out to CF 15mph+ |
Minimal |
Minimal |
+0.5 to +1.5 runs (park-dependent) |
+0.2 to +0.5 runs |
| Weather: wind in from CF 15mph+ |
Minimal |
Minimal |
-0.5 to -1.0 runs |
-0.2 to -0.4 runs |
| Temperature >85°F |
Slight boost offense |
Slight |
+0.3 to +0.5 runs |
+0.1 to +0.2 runs |
| Star hitter scratched (150+ wRC+) |
-2 to -5 cents |
Similar |
-0.1 to -0.3 runs |
-0.05 to -0.15 runs |
| Bullpen RED status |
-3 to -8 cents |
Larger impact (late-game leverage) |
+0.3 to +0.8 runs |
NO IMPACT |
| Umpire: tight zone |
Slight pitcher boost |
Similar |
-0.3 to -0.5 runs |
-0.2 to -0.3 runs |
| Day game after night game |
-1 to -3 cents offense |
Similar |
-0.1 to -0.3 runs |
Minimal |
Phase 4: Research Pipeline
Research passes (3 per day):
Pass 1 — Morning (9-10 AM ET):
- "[Team] probable pitcher today [date]"
- "[Team] starting lineup [date]"
- "[Team] injury report [date]"
- "[Park] weather forecast [date] game time"
- MLB probable pitchers page (RotoWire, RotoBaller)
- NWS API call for outdoor parks (wind at game time)
Pass 2 — Afternoon (1-2 PM ET):
- "[Team] confirmed lineup [date]"
- "[Team] bullpen availability [date]"
- "[Player] injury update [date]" (for flagged players)
- "[Team] morning stretch [date]" (lineup confirmation)
- Compare actual lineup vs projected → flag delta
Pass 3 — Pre-game (1 hour before first pitch):
- Final SP confirmation check
- Final weather check (NWS API)
- Late scratch monitoring
- Umpire crew assignment verification
- Kalshi price recheck for staleness detection
SP Change Cascade:
When SP changes from confirmed to scratched:
- CRITICAL alert fires immediately
- All 4 scanners recompute (ML, RL, Totals, F5)
- F5 gets COMPLETE recompute (SP is ~90% of F5 outcome)
- Bullpen availability rechecked (bullpen game = entire staff affected)
- Matchup card regenerated with new SP stats
- All edges recalculated and re-flagged
Phase 5: Database Schema (additional tables)
mlb_research_findings:
- game_id, date, finding_id, finding_type
- severity (CRITICAL/MODERATE/CONTEXT)
- source, source_credibility (1-5)
- raw_text, structured_adjustment (JSON)
- affects_markets (array: ML/RL/TOT/F5)
- timestamp, model_used
mlb_edge_results:
- game_id, date, market_type (ML/RL/TOT/F5)
- side (home/away/over/under)
- model_prob, pinnacle_devigged_prob, kalshi_price
- edge_cents, confidence
- sp_status_at_calc, weather_at_calc
- timestamp, staleness_flag
mlb_game_results:
- game_id, date, home_team, away_team
- final_score_home, final_score_away
- f5_score_home, f5_score_away
- rain_delay (boolean), rain_shortened (boolean)
- actual_sp_home, actual_sp_away (for SP scratch tracking)
mlb_abs_tracking:
- game_id, date, umpire_id
- total_challenges, successful_challenges
- challenge_by_inning, challenge_by_count
- pitch_location_data (JSON)
- leverage_index_at_challenge
Phase 6: Dashboard
- Daily slate overview: All games with SP status, weather flags, edge counts per market type
- Game drill-down: Full matchup card + all 4 market edges + research findings
- SP status board: All 30 teams' confirmed/probable/TBD starters with color coding
- Weather dashboard: Outdoor parks with wind/temp/precip impact estimates
- Bullpen tracker: Team-by-team availability (GREEN/YELLOW/RED/BLACK)
- Edge alerts: Sorted by magnitude, filterable by market type, with staleness timestamps
- Lineup delta alerts: Games where actual lineup deviates significantly from projected
- P&L tracker: Performance by market type, by edge bucket, Brier scores
- ABS tracking: Challenge rates by umpire (data collection phase)
OPEN QUESTIONS FOR BOSS RULING
Early-season caution mode: Council recommends widening confidence intervals 20% and raising edge thresholds 1.5% in April (limited SP sample). Confirm?
SP status handling: When SP is "probable" (not confirmed), should scanner run with discounted confidence or wait for confirmation?
Weather void rules: Should we auto-restrict Run Line and Totals exposure when rain probability exceeds a threshold (e.g., >40%)? ML and F5 still grade if game reaches 5 innings.
ABS Challenge System: Agree with phased approach (CONTEXT only through May, actionable June+ with 200+ challenge minimum)?
Bullpen availability data source: MLB Stats API has pitch counts but not always day-of availability. Should we scrape RotoWire/RotoBaller bullpen reports, or build from raw pitch-count data?
Intelligence calibration cadence: Weekly review of which IntelAdjustment types are value-additive vs. noise?
COUNCIL METADATA
| Detail |
Value |
| Council date |
2026-04-01 |
| Advisory responses |
5 (all completed) |
| Peer reviews |
5 (all completed) |
| Strongest advisor |
Opus (2/5 votes) |
| Runner-up |
gpt-oss (2/5 votes) |
| Biggest blind spot |
Gemini (2/5 votes) |
| Full council data |
/home/ubuntu/edgeclaw/data/councils/2026-04-01/mlb-research/ |
Source: ~/edgeclaw/results/panel-results/mlb-research-ruling.md