Soccer Data Audit — Council Ruling

Date: 2026-04-01 Process: Full 5-phase council (Advisory → Anonymization → Peer Review → Chairman Synthesis → Boss Ruling) Advisors: Opus, Sonnet, Gemini 3.1 Pro, Grok 4.20 Reasoning, gpt-oss-120b Winner: Opus (5 of 5 peer review votes — UNANIMOUS) Status: PENDING BOSS RULING on open questions


COUNCIL SUMMARY

Where Advisors Agreed

  1. xG (expected goals) is the #1 data requirement — from FBref (free StatsBomb data)
  2. Club Elo ratings for team strength measurement (free API)
  3. Bivariate Poisson with Dixon-Coles as the goal-scoring distribution model
  4. League-specific models — different goal rates, home advantage, and variance per league
  5. Multi-league coverage — Big 5 European leagues + Champions League minimum
  6. Injury/suspension data from Transfermarkt (free, comprehensive)
  7. Match-level weather and pitch data for outdoor fixtures
  8. Team form tracking with EWMA at home/away splits
  9. 6 market scanners — 1X2, Asian Handicap, Totals, BTTS, Correct Score, Double Chance
  10. Referee data matters (penalty rates, card rates, stoppage time patterns)

Where Advisors Disagreed

  1. xG source: gpt-oss recommended paid Opta/StatsBomb feeds. Opus specified FBref (free). Council verdict: FBref for current scale, evaluate paid feeds if scale demands.
  2. Database engine: gpt-oss recommended PostgreSQL, Opus used SQLite. Council verdict: SQLite WAL.
  3. Historical depth: Range from 2 to 5 seasons. Council verdict: 3 seasons for model training, current season for live use.

Strongest Arguments (from peer review)

Opus wins UNANIMOUSLY — every reviewer selected Opus as strongest:

Biggest Blind Spot

Gemini: Thin schema, generic data sources, no Dixon-Coles correction, no Asian Handicap quarter-line handling, no league-specific parameterization.

What Everyone Missed (from peer reviews)

  1. Transfer window impact — January/summer windows create structural breaks in team quality. Need transfer window flag and model reset.
  2. Betting exchange data (Betfair) — Exchange volume and price movements are sharper signals than bookmaker odds for soccer.
  3. VAR implementation differences by league — Some leagues use VAR more aggressively. Need league-specific VAR impact parameters.
  4. Multi-club ownership regulations — UEFA and domestic rules affect team selection in certain matchups.
  5. Altitude and travel for non-European leagues — Copa Libertadores, MLS Western Conference, etc.

BUILD PLAN

Phase 1: Core Soccer Data Tables

soccer_match_logs: fixture_id, date, league, season, home_team, away_team, home_goals, away_goals, home_xg, away_xg, home_shots, away_shots, home_sot, away_sot, home_possession, away_possession, home_corners, away_corners, referee_id

soccer_team_baselines: team_id, league, date, stat_type, last_3/5/10, season_avg, ewma_home, ewma_away, xg_for_ewma, xg_against_ewma

soccer_club_elo: team_id, date, elo_rating, elo_change, league_rank

soccer_player_availability: team_id, date, player_id, player_name, position, status, injury_type, expected_return, importance_score

soccer_fixtures_context: fixture_id, date, league, weather, pitch_status, referee_id, match_importance_home/away, days_rest_home/away, midweek_european_home/away

soccer_referee_stats: referee_id, league, avg_fouls, avg_cards, penalty_rate, var_overturn_rate, avg_added_time

soccer_league_params: league, season, avg_goals_per_game, home_advantage_coefficient, draw_rate, btts_rate, dixon_coles_rho

Phase 2: Distribution Models

Phase 3: 6 Edge Scanners

Phase 4: Dashboard


OPEN QUESTIONS FOR BOSS RULING

  1. League scope: Big 5 + CL only, or expand to MLS, Eredivisie, Liga Portugal, etc.?
  2. Dixon-Coles fitting: Needs ~2 seasons historical data per league. Confirm 3-season backfill?
  3. FBref scraping: Free but requires web scraping. Build scraper or find API wrapper?
  4. Betfair exchange data: Worth integrating as sharpness signal?
  5. Transfer window model resets: Auto-adjust team baselines after January/summer windows?
  6. Correct Score scanner: High variance — build now or defer?

COUNCIL METADATA

Detail Value
Council date 2026-04-01
Advisory responses 5 (all completed)
Peer reviews 5 (all completed)
Strongest advisor Opus (5/5 votes — UNANIMOUS)
Runner-up N/A
Biggest blind spot Gemini
Full council data /home/ubuntu/edgeclaw/data/councils/2026-04-01/soccer-data-audit/
Source: ~/edgeclaw/results/panel-results/soccer-data-audit-ruling.md