Status: MERGED FINAL — Panel-reviewed, Opus as base, unique additions cherry-picked from all other models Source: UFC_MMA_Analytics_Full_Design.txt (primary), panel review from Opus, Gemini 3.1 Pro, Grok 4.1 Fast, Grok 4.1, DeepSeek V3.1, Sonar Reasoning Pro Date: March 2026 Purpose: Data collection spec for EdgeClaw's UFC/MMA Sports Desk. Feeds the Sports Desk analyst. Primary data source: UFCStats.com (FightMetric backend). Output format: SQLite (WAL mode).
EdgeClaw is an automated, multi-sport betting analytics platform. It currently runs four sport modules: NHL, NBA, NCAAB (college basketball), and MLB. Each module tracks ~200-215 data points per game/matchup using the same core architecture:
SHARED ARCHITECTURE (all sports):
trueProb = (1/decimal) / (1/price1 + 1/price2)INFRASTRUCTURE:
Every design decision flows from these structural realities:
INDIVIDUAL vs TEAM: In team sports, regression to the mean is your friend — an injured star alters the line, but the team still functions. In MMA, the "team" is a single biological entity. There are no teammates to compensate.
MULTIPLE WIN CONDITIONS: Team sports have one outcome type. MMA has KO/TKO, submission, decision (unanimous/split/majority), DQ, doctor stoppage, draw, and no contest. This creates METHOD OF VICTORY markets that are often MORE exploitable than moneyline because they require three-way probability modeling that books struggle to price efficiently.
TINY SAMPLE SIZES: An NBA team plays 82+ games/season. A UFC fighter fights 2-4 times per year. Many fighters have only 5-10 UFC fights total. Career stats are noisy, rolling averages are volatile, and style matchups matter more than raw aggregate skill.
STEEP NONLINEAR DECLINE: Fighter decline is not gradual like team sports. Chin degradation is permanent and cumulative — once a fighter has been KO'd, they are statistically more likely to be KO'd again. Below 185 lbs (Middleweight), there is a sharp statistical cliff near age 35. [See Section 7.5 for improved decay model from panel review.]
STYLE GEOMETRY > RAW TALENT: A great wrestler often beats a great striker, but a great BJJ fighter often beats a great wrestler, and a great striker often beats a great BJJ fighter. The matchup matrix is more predictive than individual rankings.
WEIGHT CLASSES = SEPARATE ECOSYSTEMS: Heavyweight (265 lbs) is essentially a different sport than Flyweight (125 lbs). All statistics and composite scores must be calculated WITHIN weight class.
JUDGING IS SUBJECTIVE: Unlike team sports, MMA decisions depend on three judges with known biases — some favor aggression/damage, others favor takedowns/control time. Judge assignment is a real edge vector.
CAGE SIZE MATTERS: The UFC Apex facility uses a 25-foot diameter cage. Standard arenas use a 30-foot cage. The smaller Apex cage dramatically increases finish rates and favors wrestlers/pressure fighters who can cut off the cage.
WEIGHT CUTS: Fighters dehydrate to make weight, then rehydrate. Severe weight cuts impair performance. Missed weight history, moving up/down divisions, and rehydration weight are all predictive variables.
NO SEASON: The UFC runs ~45 events per year, roughly weekly. Each card is independent. Data ingestion is event-driven, not season-driven.
FIRST 5 / ROUND PROPS: Over/under rounds, exact round finish, and "fight to go the distance" (GTD) props are major UFC betting markets with significant inefficiency.
All stats derived from UFCStats.com (FightMetric). Stored as both per-fight snapshots and rolling career averages (career, last 3, last 5 fights). All rolling stats computed WITHIN weight class.
| Metric | Source | Predictive Value |
|---|---|---|
| Sig. strikes landed/min (SLpM) | UFCStats | High — primary offensive output |
| Sig. strikes absorbed/min (SApM) | UFCStats | High — defensive efficiency |
| Striking accuracy % | UFCStats | Medium — quality over volume |
| Striking defense % | UFCStats | High — evasiveness, longevity |
| Striking differential (SLpM - SApM) | Calculated | VERY HIGH — single best striking metric |
| Distance strikes landed % | UFCStats | Medium — range management |
| Clinch strikes landed % | UFCStats | Medium — infighting style |
| Ground strikes landed % | UFCStats | Medium — ground-and-pound capability |
| Head strikes landed % | UFCStats | High — KO probability proxy |
| Body strikes landed % | UFCStats | Medium — body work wears down opponent |
| Leg strikes landed % | UFCStats | Medium — mobility degradation |
| Knockdowns scored per 15 min | UFCStats | VERY HIGH — best KO predictor |
| Knockdowns received per 15 min | UFCStats | High — chin durability |
| First strike landed % | Calculated | Medium — fast starter indicator |
[PANEL: Opus] Sig. strike accuracy BY DISTANCE (range/clinch/ground breakdown): A fighter who is 55% accurate at range but 30% in the clinch has a completely different profile than one who is 40%/50%. This directly feeds the style matchup matrix. UFCStats already provides this breakdown per fight — parse at the range level rather than aggregating. Priority: MUST-HAVE.
| Metric | Source | Predictive Value |
|---|---|---|
| Takedown accuracy % | UFCStats | High — offensive wrestling success |
| Takedown defense % | UFCStats | VERY HIGH — keeps fight in preferred range |
| Takedowns landed per 15 min | UFCStats | High — wrestling volume |
| Control time per 15 min (sec) | UFCStats | High — top position dominance |
| Submission attempts per 15 min | UFCStats | Medium — submission threat level |
| Reversals per 15 min | UFCStats | Medium — scramble ability |
| Time in guard vs mount vs back | Calculated | High — positional quality of control |
| Clinch control time | Calculated | Medium — clinch fighting ability |
[PANEL: Grok 4.1 Fast] Opponent-specific historical performance: Fighters perform differently against similar styles (e.g., a wrestler might have 70% TD success vs strikers but 40% vs grapplers). Calculate from existing UFCStats per-fight breakdowns by filtering fights by opponent style tags — no new scraping needed. Priority: MUST-HAVE.
| Metric | Source | Predictive Value |
|---|---|---|
| KO/TKO win rate (career) | UFCStats | VERY HIGH — primary MoV input |
| Submission win rate (career) | UFCStats | VERY HIGH — primary MoV input |
| Finish rate (non-decision %) | Calculated | High — over/under rounds input |
| Decision rate | Calculated | High — inverse of finish rate |
| Average fight duration (sec) | Calculated | High — rounds prop pricing |
| Times knocked down (career) | UFCStats | High — chin durability proxy |
| KO/TKO losses (count) | UFCStats | VERY HIGH — chin degradation (nonlinear) |
| Absorbed strikes before finish | Calculated | High — damage tolerance trend |
| Late finish rate (Round 3+) | Calculated | Medium — cardio-dependent finishing |
| Metric | Source | Predictive Value |
|---|---|---|
| Output by round (strikes/min R1-R5) | UFCStats | VERY HIGH — cardio curve modeling |
| Output decline % (R1 to final) | Calculated | VERY HIGH — cardio cliff detection |
| Championship round performance | Calculated | High — title fight specific |
| Reach advantage (inches) | UFCStats | High — structural striking edge |
| Height advantage (inches) | UFCStats | Medium — correlated with reach |
| Leg reach (inches) | UFCStats | Medium — kicks/distance management |
| Age at fight time | Calculated | High — decline curve input |
| Days since last fight | Calculated | High — ring rust / recovery |
| Weight class change history | Scraped | High — weight cut impact |
| Missed weight history | Scraped | Medium — discipline / stress |
| Stance (orthodox/southpaw/switch) | UFCStats | Medium — stylistic matchup factor |
[PANEL: Gemini 3.1 Pro] Stance Matchup Matrix (Southpaw vs Orthodox): The spec tracks stance, but not the matchup. Open stance (Southpaw vs Orthodox) fundamentally changes striking geometry (body kicks, lead hand control) and significantly alters SLpM/SApM baselines. Calculated from existing UFCStats data. Priority: MUST-HAVE.
Each fighter is tagged with a primary and secondary style based on their statistical profile:
| Style Tag | Definition | Key Indicators |
|---|---|---|
| Boxer | Strikes at distance | High SLpM, high distance %, low TD attempts |
| Pressure Fighter | High-volume forward pressure | Very high SLpM, high SApM, low striking defense |
| Counter Striker | Low volume, high accuracy | Low SLpM, high accuracy %, high defense % |
| Wrestler | Takedown-centric control | High TD attempts, high control time, low SLpM |
| Grappler/BJJ | Submission-focused | High sub attempts, moderate TDs, finish via sub |
| Clinch Fighter | Inside fighting / dirty boxing | High clinch strikes %, high clinch control time |
| Kickboxer | Diverse striking with kicks | High distance %, balanced head/body/leg strikes |
| Well-Rounded | No dominant dimension | Balanced across all categories |
A style_matchup_matrix stores historical win rates for each style pairing within each weight class (e.g., Wrestler vs Boxer at Lightweight has a known edge historically).
[PANEL: Opus] Style classification should be CONTINUOUS, not categorical: Replace categorical tags with a continuous style vector — [striking_offense, striking_defense, wrestling_offense, wrestling_defense, submission_offense, submission_defense, clinch_work, pressure_rate] where each dimension is 0-100 based on actual stats. The matchup matrix then operates on vector distances rather than category lookups. This eliminates the subjective "who decides the tag" problem. Implement as v2 after categorical tags are working.
Based on output decline percentage from Round 1 to final round:
Cardio Cliff fighters are where over/under rounds and late-round finish props are most exploitable.
[PANEL: DeepSeek V3.1] Output under duress: Current round-by-round output doesn't account for opponent pressure. Track "output under pressure" flag — strikes output when opponent has attempted a takedown or is in chase mode. Source: FightMetric/UFCStats granular data if available. Tag fights where one fighter was significantly ahead on output vs. when being pressured.
| Source | Data | Access | Cost |
|---|---|---|---|
| UFCStats.com (FightMetric) | Official UFC stats: strikes, TDs, control time, per-round breakdowns, 67+ categories | Scrape (HTML) | FREE |
| Sherdog.com | Fighter records, results, event history, pre-UFC record | Scrape (HTML) | FREE |
| Tapology.com | Fighter records, rankings, event cards, weigh-in results, regional/pre-UFC data | Scrape (HTML) | FREE |
| ESPN MMA / UFC.com | Fighter profiles, confirmed cards, official rankings | Scrape/RSS | FREE |
[PANEL: Opus] UFCStats.com scraping is the SINGLE POINT OF FAILURE. Its HTML structure changes periodically and it has no API. MUST build secondary scraper for Sherdog/Tapology capturing at minimum: records, results, basic striking/grappling. Cache aggressively — once a fight's stats are scraped, they never change. Implement hash-based change detection on page structure to alert on breaking changes within minutes, not days.
[PANEL: DeepSeek V3.1] UFCStats uses Cloudflare protection. Budget for rotating residential proxies (~$120/month) if you hit blocks. Rate-limit to 1 req/sec, exponential backoff on 429/503.
| Source | Data | Access | Cost |
|---|---|---|---|
| MMADecisions.com | Judge scorecards, judge-by-judge tendencies, split decision history, bias analysis | Scrape | FREE |
| Verdict MMA | Community global scorecards, fan consensus vs judges | API/Scrape | FREE tier |
| FightOdds.io | Historical odds, line movement | Scrape | FREE |
| BestFightOdds.com | Multi-book odds comparison, historical closing lines | Scrape | FREE |
[PANEL: Sonar Reasoning Pro] MMADecisions.com may be incomplete or poorly maintained for recent fights. This is a single-point dependency for judge data. Add: (1) UFC.com scorecards (scrape if possible), (2) backup from MMAJunkie if they republish scores, (3) manual spot-check for data freshness on build start.
| Source | Data | Access | Cost |
|---|---|---|---|
| The Odds API | Multi-book aggregation: DK, Pinnacle, FanDuel, BetMGM | REST API | FREE tier (500 req/mo) |
| Pinnacle | Sharpest book. Early + closing moneyline, method, rounds | Historical API | FREE |
| Kalshi | Prediction market: ML, method, rounds. Candlestick API for historical closing prices | REST API | FREE (account) |
| Polymarket | Prediction market: event markets, prices, volume, on-chain order data | Free API | FREE (no account) |
| DraftKings | Opening moneyline, method, round, props | Manual/Scrape | FREE (account) |
[PANEL: Grok 4.1 Fast] Free tier of Odds API limits to 500 calls/month. Upgrade to paid ($10-20/mo) or build custom scrapers for Pinnacle/DK as primary; treat Odds API as backup only.
[PANEL: Opus] Betting line movement history / steam moves: Track line movement velocity, not just opening and closing. Scrape BestFightOdds every 6 hours pre-fight week, every 1 hour fight week, every 15 min post-weigh-in. Store as time series in odds_line_movement table. The Market Resistance Filter requires this underlying movement data to function. Priority: MUST-HAVE.
| Source | Data | Access | Cost |
|---|---|---|---|
| Sherdog / Tapology | Confirmed fight cards, weigh-in results, cancellations | Scrape | FREE |
| MMA Junkie / MMA Fighting | News, camp reports, sparring reports, injury updates | RSS/Scrape | FREE |
| RotoWire / FantasyPros | Injury reports, fight previews | Scrape | FREE |
[PANEL: Opus] Fighter injury / medical suspension history: UFC mandates medical suspensions after fights (30-180 days). A fighter returning from a 180-day medical suspension for a broken orbital bone is materially different from routine 30-day. The spec tracks "days since last fight" but not WHY the layoff occurred. Source: State athletic commission reports (public, ABC MMA website), MMA Junkie injury RSS, UFC event medical suspension lists. Priority: MUST-HAVE.
[PANEL: Opus] Late replacements / short-notice fights: Short-notice replacements (<4 weeks camp) are one of the most exploitable edges in UFC betting. The spec mentions "short camp" as a penalty but doesn't track WHO replaced whom, notice period in days, or whether the replacement was moving up/down in weight. Source: Sherdog/Tapology event pages (scrape bout history changes), MMA Junkie news RSS. Cross-reference announced date vs event date to calculate camp length. Priority: MUST-HAVE.
[PANEL: Sonar Reasoning Pro] Pre-fight press conference quotes and opponent game plan signals provide directional signal for method props and round overs that historical style win rates miss. Scrape pre-fight quotes from MMA Junkie, The Athletic. Process with basic intent extraction. Priority: MUST-HAVE.
[PANEL: Opus] Pre-UFC / regional record quality adjustment: A 10-0 fighter whose wins came against 2-5 regional opponents is fundamentally different from a 10-0 fighter who beat ranked Bellator/ONE fighters. Apply Bayesian shrinkage toward weight class average for fighters with <5 UFC bouts. Source: Sherdog full records, Tapology regional promotion quality tiers. Priority: NICE-TO-HAVE.
Unlike team sports (Sagarin, Massey, DRatings, Dimers), MMA has LIMITED free prediction sources. Target: 5-7 independent sources minimum.
| Source | Type | Outputs | Access |
|---|---|---|---|
| BetMMA.tips | Community prediction platform with verified track records | Win %, method %, crowd consensus | Scrape |
| Tapology.com predictions | Community + staff picks | Win %, method predictions | Scrape |
| FightMatrix.com | ELO-based fighter rankings | Implied win prob from ELO diff | Scrape |
| Action Network UFC | Staff model predictions | Win %, best bets | Scrape |
| Bloody Elbow staff picks | Expert panel picks (5-8 writers) | Win % from pick consensus | Scrape |
| MMABettingOdds.com / FightOdds.io | Market-implied from closing lines | No-vig probability baseline | Scrape |
| Internal ELO/Glicko-2 model | EdgeClaw internal model trained on UFCStats backfill | Win probability from rating diff | Built in-house |
[PANEL: Opus] Remove Bloody Elbow from prediction sources: Bloody Elbow's editorial staff has turned over significantly and the site's reliability as a prediction source is low. Individual writer picks from media outlets have poor calibration and no accountability. Replace with: BetMMA.tips (which has verified, tracked records) and Tapology community consensus. Two well-calibrated external sources plus Pinnacle closing line plus the internal model is sufficient. Adding noisy sources degrades the ensemble.
The ensemble average (model_avg_prob) is compared to Pinnacle closing no-vig probability to find edge. After 100+ fights of tracking, switch from equal-weight averaging to inverse-MSE weighting (better models get more weight based on Brier score accuracy).
[PANEL: Opus] The internal ELO/Glicko-2 model should NOT be blended naively with the composite. The model's predictions are partially derived from the same inputs as the composite — this creates circular dependency. Either: (1) use ELO/Glicko-2 as a REPLACEMENT for the composite, or (2) keep the composite as the primary model and use external predictions only for calibration/validation.
SQLite setup note: Enable WAL mode immediately on DB creation: PRAGMA journal_mode=WAL;. This allows concurrent reads during writes. All 16 tables below use SQLite-compatible types (INTEGER, REAL, TEXT, BLOB) instead of PostgreSQL ENUMs.
[PANEL: Gemini 3.1 Pro] Fighter entity disambiguation: Implement unique fighter IDs (UFCStats/Tapology/Sherdog crosswalk) immediately. Always use IDs, never fighter names, as join keys. Fighter name collisions and misspellings will corrupt data if not handled at schema level.
PRAGMA journal_mode=WAL;
PRAGMA foreign_keys=ON;
-- ============================================================
-- 6.1 weight_classes
-- ============================================================
CREATE TABLE IF NOT EXISTS weight_classes (
weight_class_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
upper_limit_lbs REAL NOT NULL,
gender TEXT DEFAULT 'male' CHECK(gender IN ('male','female'))
);
-- Seed:
-- Strawweight 115 female, Flyweight 125 male/female, Bantamweight 135 male/female,
-- Featherweight 145 male, Lightweight 155 male, Welterweight 170 male,
-- Middleweight 185 male, Light Heavyweight 205 male, Heavyweight 265 male
-- ============================================================
-- 6.2 fighters
-- ============================================================
CREATE TABLE IF NOT EXISTS fighters (
fighter_id INTEGER PRIMARY KEY AUTOINCREMENT,
-- Cross-platform IDs (entity disambiguation)
ufcstats_id TEXT UNIQUE, -- UFCStats internal fighter ID
sherdog_id TEXT, -- Sherdog URL slug
tapology_id TEXT, -- Tapology URL slug
-- Identity
name TEXT NOT NULL,
nickname TEXT,
date_of_birth TEXT, -- ISO 8601 date
nationality TEXT,
home_city TEXT, -- [PANEL: Opus] for travel distance calc
-- Physical
height_inches REAL,
reach_inches REAL,
leg_reach_inches REAL,
stance TEXT CHECK(stance IN ('orthodox','southpaw','switch')),
-- Style (categorical for v1, continuous vector for v2)
primary_style TEXT,
secondary_style TEXT,
-- Style vector (v2 — leave NULL for now)
style_striking_offense REAL, -- 0-100
style_striking_defense REAL,
style_wrestling_offense REAL,
style_wrestling_defense REAL,
style_submission_offense REAL,
style_submission_defense REAL,
style_clinch_work REAL,
style_pressure_rate REAL,
-- Camp / Training
team_gym TEXT,
previous_gym TEXT,
gym_change_date TEXT, -- ISO 8601 date
is_elite_gym INTEGER DEFAULT 0, -- 1 = top camp (ATT, CKB, Sanford, etc.)
-- Record aggregates
wins INTEGER DEFAULT 0,
losses INTEGER DEFAULT 0,
draws INTEGER DEFAULT 0,
no_contests INTEGER DEFAULT 0,
ko_wins INTEGER DEFAULT 0,
sub_wins INTEGER DEFAULT 0,
dec_wins INTEGER DEFAULT 0,
ko_losses INTEGER DEFAULT 0,
sub_losses INTEGER DEFAULT 0,
dec_losses INTEGER DEFAULT 0,
-- UFC-specific record
ufc_wins INTEGER DEFAULT 0,
ufc_losses INTEGER DEFAULT 0,
ufc_debut_date TEXT,
last_fight_date TEXT,
is_active INTEGER DEFAULT 1,
-- Source URLs
sherdog_url TEXT,
ufcstats_url TEXT,
tapology_url TEXT,
-- Timestamps
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_fighters_name ON fighters(name);
CREATE INDEX IF NOT EXISTS idx_fighters_active ON fighters(is_active);
CREATE INDEX IF NOT EXISTS idx_fighters_ufcstats_id ON fighters(ufcstats_id);
-- ============================================================
-- 6.3 events
-- ============================================================
CREATE TABLE IF NOT EXISTS events (
event_id INTEGER PRIMARY KEY AUTOINCREMENT,
ufcstats_event_id TEXT UNIQUE,
name TEXT NOT NULL,
event_type TEXT CHECK(event_type IN ('ppv','fight_night','contender_series','dwcs')),
event_date TEXT NOT NULL, -- ISO 8601 date
venue TEXT,
city TEXT,
state_country TEXT,
elevation_ft INTEGER, -- [PANEL: Opus/Gemini] altitude affects cardio
is_apex INTEGER DEFAULT 0, -- 1 = UFC Apex (25ft cage)
cage_size_ft INTEGER DEFAULT 30, -- 25 for Apex, 30 standard
total_bouts INTEGER,
attendance INTEGER, -- [PANEL: Sonar] crowd size effect
ufcstats_url TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_events_date ON events(event_date);
CREATE INDEX IF NOT EXISTS idx_events_type ON events(event_type);
-- ============================================================
-- 6.4 fights
-- ============================================================
CREATE TABLE IF NOT EXISTS fights (
fight_id INTEGER PRIMARY KEY AUTOINCREMENT,
ufcstats_fight_id TEXT UNIQUE,
event_id INTEGER NOT NULL REFERENCES events(event_id),
fighter1_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
fighter2_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
weight_class_id INTEGER REFERENCES weight_classes(weight_class_id),
scheduled_rounds INTEGER CHECK(scheduled_rounds IN (3,5)) DEFAULT 3,
is_title_fight INTEGER DEFAULT 0,
is_main_event INTEGER DEFAULT 0,
is_co_main INTEGER DEFAULT 0,
card_position TEXT CHECK(card_position IN ('main_card','prelim','early_prelim')),
bout_order INTEGER,
catchweight_lbs REAL,
fighter1_weigh_in REAL,
fighter2_weigh_in REAL,
fighter1_missed_weight INTEGER DEFAULT 0,
fighter2_missed_weight INTEGER DEFAULT 0,
-- Referee (tracked for stoppage tendency analysis)
referee_name TEXT, -- [PANEL: Gemini/GPT-4.1/Sonar]
-- Replacement tracking
is_late_replacement INTEGER DEFAULT 0, -- [PANEL: Opus/GPT-4.1]
replacement_fighter_id INTEGER REFERENCES fighters(fighter_id), -- who replaced
original_fighter_name TEXT, -- who was originally scheduled
notice_days INTEGER, -- days of camp (announcement to fight)
fight_status TEXT CHECK(fight_status IN ('scheduled','completed','cancelled','no_contest')) DEFAULT 'scheduled',
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_fights_event ON fights(event_id);
CREATE INDEX IF NOT EXISTS idx_fights_f1 ON fights(fighter1_id);
CREATE INDEX IF NOT EXISTS idx_fights_f2 ON fights(fighter2_id);
CREATE INDEX IF NOT EXISTS idx_fights_wc ON fights(weight_class_id);
CREATE INDEX IF NOT EXISTS idx_fights_status ON fights(fight_status);
-- ============================================================
-- 6.5 fight_results
-- ============================================================
CREATE TABLE IF NOT EXISTS fight_results (
result_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL UNIQUE REFERENCES fights(fight_id),
winner_id INTEGER REFERENCES fighters(fighter_id),
loser_id INTEGER REFERENCES fighters(fighter_id),
method TEXT NOT NULL,
-- 'KO/TKO', 'Submission', 'Decision - Unanimous',
-- 'Decision - Split', 'Decision - Majority', 'DQ',
-- 'Doctor Stoppage', 'Draw', 'No Contest'
method_detail TEXT, -- 'rear naked choke', 'head kick', 'body shot', etc
finish_round INTEGER,
finish_time_sec INTEGER,
total_fight_time_sec INTEGER,
went_to_decision INTEGER DEFAULT 0,
performance_bonus TEXT, -- 'FOTN', 'POTN', NULL
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_results_fight ON fight_results(fight_id);
CREATE INDEX IF NOT EXISTS idx_results_winner ON fight_results(winner_id);
CREATE INDEX IF NOT EXISTS idx_results_method ON fight_results(method);
-- ============================================================
-- 6.6 judges
-- ============================================================
CREATE TABLE IF NOT EXISTS judges (
judge_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
home_state TEXT,
total_fights_judged INTEGER DEFAULT 0,
pct_unanimous REAL,
pct_split REAL,
pct_majority REAL,
pct_favor_aggressor REAL, -- % of close rounds given to higher-output fighter
pct_favor_control REAL, -- % given to fighter with more control time
pct_favor_damage REAL, -- % given to fighter with more knockdowns
pct_favor_wrestler REAL, -- tendency to reward takedowns + control
agreement_rate REAL, -- % agreement with other two judges
controversial_rate REAL, -- % of fights with media/community disagreement
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
-- ============================================================
-- 6.7 bout_judges (junction table)
-- ============================================================
CREATE TABLE IF NOT EXISTS bout_judges (
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
judge_id INTEGER NOT NULL REFERENCES judges(judge_id),
score_fighter1 TEXT, -- e.g. '29-28'
score_fighter2 TEXT, -- e.g. '28-29'
PRIMARY KEY (fight_id, judge_id)
);
-- [PANEL: Sonar] Pre-fight judge panel snapshot table
-- When judges are announced (1-2 days before event), store for pre-fight edge calc
CREATE TABLE IF NOT EXISTS pre_fight_judges (
snapshot_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
judge_id INTEGER NOT NULL REFERENCES judges(judge_id),
captured_at TEXT NOT NULL,
UNIQUE(fight_id, judge_id)
);
-- ============================================================
-- 6.8 fighter_fight_stats
-- ============================================================
-- Per-fighter, per-fight statistics. Two rows per fight (one per fighter).
CREATE TABLE IF NOT EXISTS fighter_fight_stats (
stat_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
-- Striking
sig_strikes_landed INTEGER,
sig_strikes_attempted INTEGER,
sig_strikes_absorbed INTEGER,
total_strikes_landed INTEGER,
total_strikes_attempted INTEGER,
knockdowns_scored INTEGER DEFAULT 0,
knockdowns_received INTEGER DEFAULT 0,
head_strikes_landed INTEGER,
head_strikes_attempted INTEGER,
body_strikes_landed INTEGER,
body_strikes_attempted INTEGER,
leg_strikes_landed INTEGER,
leg_strikes_attempted INTEGER,
distance_strikes_landed INTEGER,
distance_strikes_att INTEGER,
clinch_strikes_landed INTEGER,
clinch_strikes_att INTEGER,
ground_strikes_landed INTEGER,
ground_strikes_att INTEGER,
-- [PANEL: Opus] Accuracy by range (MUST-HAVE — already in UFCStats, just parse it)
distance_accuracy_pct REAL, -- sig strikes at range: landed/attempted
clinch_accuracy_pct REAL, -- sig strikes in clinch: landed/attempted
ground_accuracy_pct REAL, -- sig strikes on ground: landed/attempted
-- Grappling
takedowns_landed INTEGER,
takedowns_attempted INTEGER,
takedowns_defended INTEGER,
submission_attempts INTEGER DEFAULT 0,
reversals INTEGER DEFAULT 0,
control_time_sec INTEGER DEFAULT 0,
-- Calculated per-minute rates (denormalized for query speed)
sig_strikes_per_min REAL,
sig_strikes_abs_per_min REAL,
striking_accuracy_pct REAL,
striking_defense_pct REAL,
takedown_accuracy_pct REAL,
takedown_defense_pct REAL,
control_time_per_min REAL,
-- Meta
fight_time_sec INTEGER,
created_at TEXT DEFAULT (datetime('now')),
UNIQUE(fight_id, fighter_id)
);
CREATE INDEX IF NOT EXISTS idx_ffs_fight ON fighter_fight_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_ffs_fighter ON fighter_fight_stats(fighter_id);
-- ============================================================
-- 6.9 fighter_round_stats
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_round_stats (
round_stat_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
round_number INTEGER NOT NULL,
sig_strikes_landed INTEGER,
sig_strikes_attempted INTEGER,
sig_strikes_absorbed INTEGER,
knockdowns_scored INTEGER DEFAULT 0,
knockdowns_received INTEGER DEFAULT 0,
takedowns_landed INTEGER,
takedowns_attempted INTEGER,
submission_attempts INTEGER DEFAULT 0,
reversals INTEGER DEFAULT 0,
control_time_sec INTEGER DEFAULT 0,
head_strikes_landed INTEGER,
body_strikes_landed INTEGER,
leg_strikes_landed INTEGER,
distance_strikes_landed INTEGER,
clinch_strikes_landed INTEGER,
ground_strikes_landed INTEGER,
round_duration_sec INTEGER DEFAULT 300,
created_at TEXT DEFAULT (datetime('now')),
UNIQUE(fight_id, fighter_id, round_number)
);
CREATE INDEX IF NOT EXISTS idx_frs_fight ON fighter_round_stats(fight_id);
CREATE INDEX IF NOT EXISTS idx_frs_fighter ON fighter_round_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_frs_round ON fighter_round_stats(round_number);
-- ============================================================
-- 6.10 fighter_rolling_stats
-- ============================================================
-- Pre-computed rolling career averages for each fighter AT THE TIME of
-- each fight. Avoids recalculating from raw stats on every query.
-- Updated BEFORE each fight via ETL. Enables point-in-time backtesting
-- without future data leak.
CREATE TABLE IF NOT EXISTS fighter_rolling_stats (
rolling_id INTEGER PRIMARY KEY AUTOINCREMENT,
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
as_of_fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
as_of_date TEXT NOT NULL,
num_fights INTEGER,
num_ufc_fights INTEGER,
-- Rolling averages (career, last 3, last 5)
career_slpm REAL,
career_sapm REAL,
career_str_acc REAL,
career_str_def REAL,
career_td_acc REAL,
career_td_def REAL,
career_sub_att_per15 REAL,
career_ctrl_per15 REAL,
career_kd_per15 REAL,
last3_slpm REAL,
last3_sapm REAL,
last3_str_acc REAL,
last3_str_def REAL,
last3_td_acc REAL,
last3_td_def REAL,
last5_slpm REAL,
last5_sapm REAL,
-- [PANEL: Opus] Bayesian shrinkage applied values (shrunk toward weight class mean)
-- Formula: adj_stat = (n * fighter_stat + k * class_mean) / (n + k), k=5
adj_slpm REAL, -- shrinkage-adjusted SLpM
adj_sapm REAL,
adj_str_acc REAL,
adj_td_acc REAL,
adj_td_def REAL,
adj_finish_rate REAL,
-- Finishing rates
career_ko_rate REAL,
career_sub_rate REAL,
career_dec_rate REAL,
career_finish_rate REAL,
-- Durability
career_kd_absorbed_per15 REAL,
career_ko_losses INTEGER,
-- Cardio (output decline)
avg_r1_output REAL,
avg_r2_output REAL,
avg_r3_output REAL,
avg_r4_output REAL, -- NULL if never fought 4+ rounds
avg_r5_output REAL, -- NULL if never fought 5 rounds
output_decline_pct REAL,
-- Physical at fight time
age_at_fight REAL,
days_since_last_fight INTEGER,
weight_class_id INTEGER REFERENCES weight_classes(weight_class_id),
created_at TEXT DEFAULT (datetime('now')),
UNIQUE(fighter_id, as_of_fight_id)
);
CREATE INDEX IF NOT EXISTS idx_rolling_fighter ON fighter_rolling_stats(fighter_id);
CREATE INDEX IF NOT EXISTS idx_rolling_date ON fighter_rolling_stats(as_of_date);
-- ============================================================
-- 6.11 fighter_weight_history
-- ============================================================
CREATE TABLE IF NOT EXISTS fighter_weight_history (
history_id INTEGER PRIMARY KEY AUTOINCREMENT,
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
weight_class_id INTEGER REFERENCES weight_classes(weight_class_id),
weigh_in_lbs REAL,
missed_weight INTEGER DEFAULT 0,
missed_weight_lbs REAL,
rehydration_est REAL,
moved_up INTEGER DEFAULT 0,
moved_down INTEGER DEFAULT 0,
UNIQUE(fighter_id, fight_id)
);
CREATE INDEX IF NOT EXISTS idx_wh_fighter ON fighter_weight_history(fighter_id);
-- ============================================================
-- 6.12 style_matchup_matrix
-- ============================================================
CREATE TABLE IF NOT EXISTS style_matchup_matrix (
matchup_id INTEGER PRIMARY KEY AUTOINCREMENT,
style_a TEXT NOT NULL,
style_b TEXT NOT NULL,
weight_class_id INTEGER REFERENCES weight_classes(weight_class_id),
weight_class_group TEXT, -- [PANEL: Gemini] 'lower'(125-145), 'middle'(155-170), 'upper'(185-265)
style_a_win_rate REAL,
sample_size INTEGER,
last_updated TEXT,
UNIQUE(style_a, style_b, weight_class_id)
);
-- [PANEL: Gemini] Group weight classes for style matchup to get statistically significant sample sizes.
-- Calculating 'Wrestler vs Boxer at Lightweight' alone yields tiny samples (8-12 fights).
-- Group: lower (125-145), middle (155-170), upper (185-265).
-- ============================================================
-- 6.13 betting_odds
-- ============================================================
CREATE TABLE IF NOT EXISTS betting_odds (
odds_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
book TEXT NOT NULL CHECK(book IN ('draftkings','pinnacle','kalshi','polymarket','fanduel','betmgm','betmma','tapology')),
market_type TEXT NOT NULL,
-- 'moneyline', 'method_ko', 'method_sub', 'method_dec',
-- 'over_rounds', 'under_rounds', 'round_1', 'round_2', 'round_3',
-- 'fight_gtd', 'itd'
odds_american INTEGER,
odds_decimal REAL,
implied_prob REAL,
line_value REAL, -- for over/under (e.g., 2.5 rounds)
snapshot_type TEXT CHECK(snapshot_type IN ('opening','early','midweek','closing')),
volume INTEGER, -- Kalshi contract volume
captured_at TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_odds_fight ON betting_odds(fight_id);
CREATE INDEX IF NOT EXISTS idx_odds_fighter ON betting_odds(fighter_id);
CREATE INDEX IF NOT EXISTS idx_odds_book ON betting_odds(book);
CREATE INDEX IF NOT EXISTS idx_odds_market ON betting_odds(market_type);
CREATE INDEX IF NOT EXISTS idx_odds_snapshot ON betting_odds(snapshot_type);
CREATE INDEX IF NOT EXISTS idx_odds_captured ON betting_odds(captured_at);
-- [PANEL: Opus] Line movement history table (MUST-HAVE)
-- Track line velocity, not just opening/closing.
CREATE TABLE IF NOT EXISTS odds_line_movement (
movement_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
book TEXT NOT NULL,
market_type TEXT NOT NULL,
odds_american INTEGER,
implied_prob REAL,
movement_magnitude REAL, -- delta from previous snapshot
captured_at TEXT NOT NULL,
hours_to_event REAL, -- negative = post-event
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_lm_fight ON odds_line_movement(fight_id);
CREATE INDEX IF NOT EXISTS idx_lm_captured ON odds_line_movement(captured_at);
-- ============================================================
-- 6.14 predictions
-- ============================================================
CREATE TABLE IF NOT EXISTS predictions (
prediction_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
source TEXT NOT NULL,
-- 'betmma', 'tapology', 'fightmatrix', 'action_network',
-- 'edgeclaw_elo', 'market_implied'
win_prob REAL,
ko_prob REAL,
sub_prob REAL,
dec_prob REAL,
confidence REAL,
captured_at TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_pred_fight ON predictions(fight_id);
CREATE INDEX IF NOT EXISTS idx_pred_source ON predictions(source);
-- ============================================================
-- 6.15 calculated_edges
-- ============================================================
CREATE TABLE IF NOT EXISTS calculated_edges (
edge_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
-- No-vig probabilities
pinnacle_novig_prob REAL,
dk_novig_prob REAL,
kalshi_novig_prob REAL,
-- Model ensemble
model_avg_prob REAL,
model_median_prob REAL,
model_count INTEGER,
model_std_dev REAL,
-- Edge calculations
edge_vs_pinnacle REAL,
edge_vs_kalshi REAL,
edge_vs_dk REAL,
cross_book_edge REAL,
-- Sharp action indicators
reverse_line_movement INTEGER DEFAULT 0,
steam_move INTEGER DEFAULT 0,
vig_squeeze INTEGER DEFAULT 0,
kalshi_volume_surge INTEGER DEFAULT 0,
line_move_direction TEXT,
line_move_magnitude REAL,
line_move_velocity REAL, -- [PANEL: Opus] rate of change per hour
-- Method edges
model_ko_prob REAL,
model_sub_prob REAL,
model_dec_prob REAL,
market_ko_prob REAL,
market_sub_prob REAL,
market_dec_prob REAL,
ko_edge REAL,
sub_edge REAL,
dec_edge REAL,
-- Composite & tier
quality_composite REAL,
edge_tier TEXT CHECK(edge_tier IN ('S','A','B','C','D')),
-- Post-fight grading
clv_pinnacle REAL,
clv_kalshi REAL,
prediction_correct INTEGER,
method_correct INTEGER,
brier_score REAL, -- per-prediction Brier score
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now')),
UNIQUE(fight_id, fighter_id)
);
CREATE INDEX IF NOT EXISTS idx_edges_fight ON calculated_edges(fight_id);
CREATE INDEX IF NOT EXISTS idx_edges_fighter ON calculated_edges(fighter_id);
CREATE INDEX IF NOT EXISTS idx_edges_tier ON calculated_edges(edge_tier);
CREATE INDEX IF NOT EXISTS idx_edges_composite ON calculated_edges(quality_composite);
-- ============================================================
-- 6.16 live_tracking
-- ============================================================
-- WAL mode handles concurrent writes. Batch writes every 1 min for ML,
-- every 5 min for method/round props to reduce contention.
CREATE TABLE IF NOT EXISTS live_tracking (
tracking_id INTEGER PRIMARY KEY AUTOINCREMENT,
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
book TEXT DEFAULT 'kalshi',
market_type TEXT DEFAULT 'moneyline',
price REAL,
implied_prob REAL,
volume INTEGER,
volume_delta INTEGER,
captured_at TEXT NOT NULL,
minutes_to_event INTEGER, -- negative = event started
round_number INTEGER, -- which round is live (NULL if between rounds)
data_latency_ms INTEGER, -- [PANEL: Sonar] SLA monitoring
is_stale INTEGER DEFAULT 0, -- 1 if capture missed SLA window
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_live_fight ON live_tracking(fight_id);
CREATE INDEX IF NOT EXISTS idx_live_time ON live_tracking(captured_at);
-- ============================================================
-- 6.17 fighter_injuries (PANEL ADDITION — MUST-HAVE)
-- ============================================================
-- [PANEL: Opus, GPT-4.1, Grok 4.1, DeepSeek, Grok 4.1 Fast]
-- Medical suspensions and injury history. All panel models flagged as MUST-HAVE.
CREATE TABLE IF NOT EXISTS fighter_injuries (
injury_id INTEGER PRIMARY KEY AUTOINCREMENT,
fighter_id INTEGER NOT NULL REFERENCES fighters(fighter_id),
fight_id INTEGER REFERENCES fights(fight_id), -- fight that caused it
injury_type TEXT, -- 'knee', 'shoulder', 'orbital', 'hand', 'back', etc
suspension_days INTEGER, -- mandatory medical suspension length
suspension_end TEXT, -- ISO date when cleared to return
is_cleared INTEGER DEFAULT 0,
clearance_type TEXT, -- 'time_served', 'medical_clearance'
source TEXT, -- 'nsac', 'csac', 'mmajunkie', 'twitter'
notes TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_injuries_fighter ON fighter_injuries(fighter_id);
-- ============================================================
-- 6.18 referees
-- ============================================================
-- [PANEL: Gemini, GPT-4.1, Sonar, Grok 4.1 Fast]
-- Referee stoppage tendencies affect method/round market pricing.
CREATE TABLE IF NOT EXISTS referees (
referee_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
total_fights INTEGER DEFAULT 0,
early_stoppage_rate REAL, -- % of fights stopped before most refs would
late_stoppage_rate REAL, -- % of fights allowed to continue longer
standup_rate REAL, -- standups per fight (favors strikers)
finish_rate_as_ref REAL, -- KO+Sub rate in fights they ref
-- Apex vs arena tendency (referees may behave differently in smaller cage)
apex_finish_rate REAL,
arena_finish_rate REAL,
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
-- Junction: which referee is assigned to a fight (pre-event, scraped from UFC.com)
CREATE TABLE IF NOT EXISTS fight_referees (
fight_id INTEGER NOT NULL REFERENCES fights(fight_id),
referee_id INTEGER NOT NULL REFERENCES referees(referee_id),
confirmed INTEGER DEFAULT 0, -- 0 = expected, 1 = confirmed on fight card
captured_at TEXT NOT NULL,
PRIMARY KEY (fight_id, referee_id)
);
-- ============================================================
-- 6.19 data_quality_log
-- ============================================================
-- [PANEL: Sonar] Sanity checks on scraped data. Flag and quarantine bad rows.
CREATE TABLE IF NOT EXISTS data_quality_log (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
table_name TEXT NOT NULL,
record_id INTEGER,
field_name TEXT,
raw_value TEXT,
issue TEXT, -- 'out_of_range', 'impossible_value', 'missing_required'
resolution TEXT, -- 'quarantined', 'corrected', 'accepted_with_flag'
created_at TEXT DEFAULT (datetime('now'))
);
Computed FRESH for each fighter in each specific matchup. Matchup-dependent — the same fighter will have different composite scores against different opponents.
quality_composite = (
0.25 * striking_differential_score
+ 0.20 * grappling_advantage_score
+ 0.15 * finishing_ability_score
+ 0.10 * cardio_pace_score
+ 0.10 * durability_score
+ 0.10 * style_matchup_score
+ 0.05 * experience_recency_score
+ 0.05 * situational_score
)
[PANEL: Opus — CRITICAL] These weights (25/20/15/10/10/10/5/5) are educated guesses with zero empirical backing. The plan: launch with these weights as starting point, then implement automated weight optimization using logistic regression on historical fight outcomes. The composite should be a learned model, not a hand-tuned formula. At minimum, run a backtest on 2018-2024 data before going live and adjust weights based on actual predictive power per component. [PANEL: Sonar] Use inverse-MSE weighted averaging as an intermediate approach until logistic regression has enough data.
[PANEL: Opus] The optimal weights will vary BY WEIGHT CLASS (grappling matters more at 170+, striking differential matters more at 135-155) and BY ERA (wrestling dominance has declined since 2018). Per-class tuning deferred to v2 (need 12+ months of data per class).
Percentile rank of (last3_slpm - last3_sapm) within the fighter's weight class. A fighter at the 80th percentile in striking differential gets 80.
Bonuses: +5 for reach advantage >3 inches. +5 for knockdowns_per_15 in top quartile of division.
Use Bayesian-shrinkage-adjusted stats (adj_slpm, adj_sapm) for fighters with <5 UFC fights.
Weighted blend within weight class:
Use adjusted stats for small-sample fighters.
career_finish_rate * 50
+ knockdowns_per_15_percentile * 30
+ (opponent career_ko_losses > 2 ? +20 : 0)
Capped at 100.
100 - (output_decline_pct * 100)
A fighter who maintains 90% of R1 output in R3 scores 90. Bonus: +10 for positive championship round history (win % in R4-R5 fights). Penalty: -10 if categorized as "Cardio Cliff" archetype.
100 - (career_kd_absorbed_per15_percentile * 0.5 + ko_losses_penalty)
KO losses penalty — improved decay model [PANEL: Opus]:
The original step function (0/-15/-30/-50) ignores context. Use a decay function:
chin_penalty = sum over each KO loss:
base_penalty * (1 - recovery_rate * months_since_KO)
where:
base_penalty = 20 per KO loss
recovery_rate = 0.02 per month
minimum recovered penalty = 5 per KO loss (never fully goes away)
recency flag: if KO loss within last 2 fights, apply 1.5x multiplier
For v1 implementation simplicity, the original step function (0/-15/-30/-50) with recency doubling is acceptable. Upgrade to the decay model in v2.
Lookup style_a_win_rate from style_matchup_matrix for this fighter's primary style vs opponent's primary style within weight class GROUP (lower/middle/upper).
win_rate * 100 = score. If wrestler vs boxer and wrestler wins 62% historically, score = 62. If no data, default to 50.
[PANEL: Gemini] Group weight classes (lower 125-145, middle 155-170, upper 185-265) to get statistically significant sample sizes. Per-class data is too sparse for most style pairings.
[PANEL: Gemini] Add Stance Matchup modifier: Open-stance (Southpaw vs Orthodox) fights have measurably different SLpM/SApM baselines. Apply ±3 points modifier based on historical open vs closed stance win rates at this weight class group.
Base: min(ufc_fights * 5, 50)
Layoff adjustment:
60-180 days since last fight = 0 (optimal)
30-59 days = -10 (short camp)
181-365 days = -15 (ring rust)
365+ days = -25 (severe ring rust)
Short notice replacement: If this fighter is a late replacement with notice_days < 28, apply an additional -8 penalty. [PANEL: Opus/GPT-4.1]
Base: 50
Modifiers:
Title fight / main event: +10
PPV main card: +5
Missed weight: -15
Moving up in weight: -5
Moving down in weight: +5 (if successful history)
Altitude >5000ft: -5 (both fighters equally)
Apex cage (25ft): +5 for wrestlers/pressure fighters, -5 for counter strikers
Returning from medical suspension: -8 if cleared from 90+ day suspension [PANEL: Opus]
Camp change to elite gym (within 12 months): +5 [PANEL: Opus/GPT-4.1]
Camp change within 6 months (still adjusting): -3
Left elite gym for smaller camp: -5
CRITICAL — AGE CLIFF (improved) [PANEL: Opus/Gemini/Sonar]:
The original spec uses a step function at exactly age 35 for fighters below 185 lbs. This creates discontinuities (34.9 vs 35.1 year olds treated wildly differently).
Improved model: Use a sigmoid decay curve that starts at 32 and reaches full penalty by 38. Curve steepness by division:
For v1 implementation simplicity, the original step function with correct division groupings is acceptable. Upgrade to sigmoid in v2.
These stack on top of the 0-100 base. Apply a floor of 10 — the composite cannot drop below 10 after all boosters. [PANEL: Opus]
Compound Decline Penalty (up to -15):
Weight Cut Severity Penalty (up to -15):
Camp Change Boost/Penalty:
Elite Gym List [PANEL: GPT-4.1]: Maintain as a config object (not hardcoded strings), updated annually with win rates:
{
"elite_gyms": [
"American Top Team", "City Kickboxing", "Sanford MMA",
"Jackson-Wink MMA", "AKA", "Tristar Gym", "Glory MMA",
"10th Planet Jiu-Jitsu", "Rafael Lovato Jr MMA"
]
}
Market Resistance Filter: When model shows big edge (S or A tier) but the line doesn't move for 3+ hours while other fight lines are moving, flag and downgrade by one tier.
BOTH composite threshold AND model edge must be met simultaneously.
| Tier | Composite | Model Edge Required | Action |
|---|---|---|---|
| S | 85-100 | +7% vs market | Strong play — max unit |
| A | 70-84 | +5% vs market | Standard play — 1 unit |
| B | 55-69 | +3% vs market | Lean play — half unit |
| C | 40-54 | Any positive edge | Track only — no bet |
| D | 0-39 | N/A | Skip — insufficient data |
A fight with a great composite but no market edge is still C-tier. A fight with a huge market edge but weak composite is suspicious.
[PANEL: Opus] S-tier threshold of +7% may be too strict for moneyline. On Kalshi/Polymarket, UFC markets often have wide spreads (5-10 cents). The real edges are in METHOD and ROUNDS props where pricing is less efficient. Consider: S-tier at +5% for method/round props (higher inefficiency) and +7% only for moneyline.
The biggest UFC-specific edge vector. Method markets are less efficient than moneyline because books must price three-way probabilities. Public bettors systematically overvalue knockouts and underprice decisions.
fighter_A_ko_rate = A.ko_wins / A.total_wins
fighter_A_sub_rate = A.sub_wins / A.total_wins
fighter_A_dec_rate = A.dec_wins / A.total_wins
fighter_B_ko_loss_rate = B.ko_losses / B.total_losses
fighter_B_sub_loss_rate = B.sub_losses / B.total_losses
fighter_B_dec_loss_rate = B.dec_losses / B.total_losses
-- For Fighter A winning by KO:
raw_ko_prob = win_prob_A * (w1 * fighter_A_ko_rate + w2 * fighter_B_ko_loss_rate)
-- Sample-size weights:
w1 = A.total_wins / (A.total_wins + B.total_losses)
w2 = B.total_losses / (A.total_wins + B.total_losses)
-- Normalize so KO + Sub + Dec = win_prob_A
[PANEL: Sonar — CRITICAL] Method outcomes are NOT independent: High TD accuracy correlates with submission wins; weak chin correlates with KO losses. The sample-size weighted blend treats these as independent probabilities. In v2, use multinomial logistic regression or categorical outcome trees to model the full joint distribution. For v1, the blend is adequate.
Alternative approach (geometric mean) — deferred to v2 per panel consensus. Build sample-size weighted first, backtest, then compare.
Compare model method probabilities to DraftKings method props. Edge = model_ko_prob - market_ko_implied_prob. Method markets typically have 15-25% vig, so significant edges exist. Biggest inefficiencies: decision markets (underpriced) and submission markets (mispriced in grappling matchups).
-- Compute from fighter_round_stats:
SELECT
fighter_id,
round_number,
AVG(sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)) AS avg_strikes_per_min
FROM fighter_round_stats
GROUP BY fighter_id, round_number
ORDER BY fighter_id, round_number;
P(finish_in_round_N) = base_finish_rate
* cardio_multiplier(fighter_A, N)
* cardio_multiplier(fighter_B, N)
* cumulative_damage_factor(N)
base_finish_rate = (A.career_finish_rate + B.career_finish_rate) / 2
cardio_multiplier(fighter, N) = fighter.cardio_curve[N] / fighter.cardio_curve[1]
-- The WEAKER fighter's decline INCREASES opponent's finish probability
cumulative_damage_factor(N) = 1.0 + (0.05 * N)
-- 5% increase per round (accumulated damage)
P(goes_to_decision) = 1 - SUM(P(finish_in_round_N) for N in 1..scheduled_rounds)
[PANEL: Sonar] Round probabilities are path-dependent, not independent. If Fighter A is damaged in R2, R3 finish probability is conditional on surviving R2 damaged. In v2, use Markov chain or hidden-state model (HMM). For v1, the naive independence model is acceptable.
-- Standard 3-round fight:
P(under_1.5) = P(finish_R1)
P(over_1.5) = 1 - P(under_1.5)
P(under_2.5) = P(finish_R1) + P(finish_R2)
P(over_2.5) = 1 - P(under_2.5)
-- 5-round title fights: extend to 2.5, 3.5, 4.5
The 2.5 round market in title fights is often mispriced because the market anchors on 3-round base rates.
1-minute cadence for ML prices; 5-minute cadence for method/round props. This cuts data volume by ~70% vs tracking all markets every minute. [PANEL: Opus]
Every 1 minute (ML only):
Every 5 minutes (method/round):
Round breaks (critical snapshots):
[PANEL: Sonar — HIGH] Live tracking needs a data latency SLA. Define: (1) max acceptable staleness per data point (e.g., 90 seconds for ML), (2) fallback hierarchy: Kalshi > Polymarket > Pinnacle, (3) capture timestamp vs target timestamp in data_latency_ms column. A 10-second latency can be the difference between catching a 0.30 cent swing and being late.
SQLite in WAL mode handles this use case adequately at the scale of 1-2 UFC events. If contention occurs, write live tracking to a separate SQLite file (live_tracking.db) to isolate write contention from the main database. [PANEL: Opus]
Closing Odds Snapshot:
event_start - 20 minbetting_odds table with snapshot_type = 'closing'Results Grading (post-fight):
clv = our_implied_prob - closing_novig_prob (positive = beat the close)Panel consensus on build order: backfill and backtest first, live tracking last.
Step 1 — UFCStats scraper + fighter database + historical backfill (2018+)
Step 2 — Rolling stats calculator + Bayesian shrinkage + percentile rankings
fighter_rolling_stats with adjusted stats for all fightersStep 3 — Pinnacle closing line scraper + Odds API + historical odds backfill (2018+)
Step 4 — Composite formula v1 + backtesting framework
Step 5 — Method of Victory model + round probability model
Step 6 — Kalshi + Polymarket scrapers (pre-event snapshots only, no live tracking yet)
Step 7 — External prediction scrapers (BetMMA.tips + Tapology) + ensemble
Step 8 — Judge bias database (MMADecisions.com) + referee tendencies + decision probability adjustment
Step 9 — Automation: event detection, pre-event reports, post-event grading, Brier scoring
Step 10 — Live 1-minute tracking (ML prices only) + late replacement alerts + line movement alerts
[PANEL: Sonar] Pre-launch production simulation (week 0): Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate edges, generate a pre-event report, grade against actual outcomes. Catches broken scrapers, API limits, and schema surprises before real money is involved.
[PANEL: Sonar] Explicit launch gate: Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51. If Phase 10 is reached without these benchmarks, either the model doesn't work or it's untested.
| Source | Cadence | Notes |
|---|---|---|
| UFCStats fighter profiles | Weekly (Monday) | Detect new fighters, update active rosters |
| UFCStats fight stats | After each event | Cache-first: check if fight exists before scraping |
| UFCStats round stats | After each event | Same event trigger as fight stats |
| Sherdog records | Weekly | Fallback / supplemental to UFCStats |
| Tapology fight cards | Daily | Detect upcoming fights, late replacements |
| MMA Junkie RSS | Hourly | Injury reports, camp news, replacement announcements |
| UFC.com event pages | Daily | Judge/referee assignments (appear 1-2 days before event) |
| The Odds API | 4x/day baseline, hourly fight week | Opening/early/midweek/closing snapshots |
| BestFightOdds | 6h pre-fight week, 1h fight week, 15min post-weigh-in | Line movement tracking |
| Kalshi / Polymarket | 4x/day baseline, 6h pre-fight week | Pre-event snapshots |
| Live: ML prices | Every 1 minute during event | Fight window only |
| Live: method/round | Every 5 minutes during event | Fight window only |
| MMADecisions.com | One-time bulk + after each event | Judge scorecards |
| BetMMA.tips | 24h before event | Prediction ensemble input |
| FightMatrix | 48h before event | ELO-based predictions |
| Table | Rows/year | Row size | Annual storage |
|---|---|---|---|
| fighters | ~200 new | ~500 bytes | ~100 KB |
| events | ~45 | ~200 bytes | ~9 KB |
| fights | ~600 | ~400 bytes | ~240 KB |
| fight_results | ~600 | ~200 bytes | ~120 KB |
| fighter_fight_stats | ~1200 | ~600 bytes | ~720 KB |
| fighter_round_stats | ~3600 | ~400 bytes | ~1.4 MB |
| fighter_rolling_stats | ~1200 | ~800 bytes | ~960 KB |
| fighter_weight_history | ~1200 | ~200 bytes | ~240 KB |
| betting_odds | ~15000 | ~200 bytes | ~3 MB |
| odds_line_movement | ~50000 | ~200 bytes | ~10 MB |
| predictions | ~6000 | ~200 bytes | ~1.2 MB |
| calculated_edges | ~1200 | ~500 bytes | ~600 KB |
| live_tracking | ~180000 | ~200 bytes | ~36 MB |
| fighter_injuries | ~300 | ~200 bytes | ~60 KB |
| judges / bout_judges | ~500 | ~300 bytes | ~150 KB |
| referees / fight_referees | ~200 | ~300 bytes | ~60 KB |
| Total (year 1) | ~55 MB |
Historical backfill (2018-2026, ~8 years): ~400 MB. Well within SQLite's practical limits. After 5 years of live operation: ~650 MB total — still fine for SQLite.
-- Get all rolling stats for a specific fighter going into a specific fight
SELECT
f.name,
frs.age_at_fight,
frs.days_since_last_fight,
frs.career_finish_rate,
frs.career_ko_losses,
frs.adj_slpm, -- Bayesian-shrunk SLpM
frs.adj_sapm,
frs.adj_str_acc,
frs.adj_td_acc,
frs.adj_td_def,
frs.output_decline_pct,
frs.career_ko_rate,
frs.career_sub_rate,
frs.career_dec_rate,
f.primary_style,
f.team_gym,
f.gym_change_date
FROM fighters f
JOIN fighter_rolling_stats frs ON f.fighter_id = frs.fighter_id
WHERE frs.as_of_fight_id = ? -- specific fight ID
AND frs.fighter_id = ?; -- specific fighter ID
-- Compute percentile rank of a fighter's striking differential within their weight class
WITH class_fighters AS (
SELECT
frs.fighter_id,
(frs.adj_slpm - frs.adj_sapm) AS str_diff,
frs.weight_class_id,
COUNT(*) OVER (PARTITION BY frs.weight_class_id) AS class_count,
RANK() OVER (PARTITION BY frs.weight_class_id ORDER BY (frs.adj_slpm - frs.adj_sapm)) AS rank_in_class
FROM fighter_rolling_stats frs
WHERE frs.as_of_date = (
SELECT MAX(as_of_date) FROM fighter_rolling_stats WHERE fighter_id = frs.fighter_id
)
)
SELECT
fighter_id,
str_diff,
ROUND(1.0 * rank_in_class / class_count * 100, 1) AS percentile
FROM class_fighters
WHERE weight_class_id = ?;
-- Compare model method probs to market for all upcoming fights
SELECT
f1.name AS fighter1,
f2.name AS fighter2,
ce.model_ko_prob,
ce.market_ko_prob,
ce.ko_edge,
ce.model_sub_prob,
ce.market_sub_prob,
ce.sub_edge,
ce.model_dec_prob,
ce.market_dec_prob,
ce.dec_edge,
ce.edge_tier
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date >= date('now')
AND ce.fighter_id = fi.fighter1_id
ORDER BY ABS(ce.ko_edge) + ABS(ce.sub_edge) + ABS(ce.dec_edge) DESC;
-- Detect significant line moves (>10% shift) in last 6 hours
SELECT
lm.fight_id,
lm.fighter_id,
f.name,
lm.book,
MIN(lm.implied_prob) AS prob_low,
MAX(lm.implied_prob) AS prob_high,
(MAX(lm.implied_prob) - MIN(lm.implied_prob)) AS movement_magnitude,
COUNT(*) AS snapshot_count
FROM odds_line_movement lm
JOIN fighters f ON lm.fighter_id = f.fighter_id
WHERE lm.captured_at >= datetime('now', '-6 hours')
AND lm.market_type = 'moneyline'
GROUP BY lm.fight_id, lm.fighter_id, lm.book
HAVING movement_magnitude > 0.10
ORDER BY movement_magnitude DESC;
-- All S and A tier picks for the next event
SELECT
ev.name AS event,
ev.event_date,
f.name AS fighter,
ce.quality_composite,
ce.edge_tier,
ce.edge_vs_pinnacle,
ce.edge_vs_kalshi,
ce.model_avg_prob,
ce.pinnacle_novig_prob
FROM calculated_edges ce
JOIN fights fi ON ce.fight_id = fi.fight_id
JOIN events ev ON fi.event_id = ev.event_id
JOIN fighters f ON ce.fighter_id = f.fighter_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
AND ce.edge_tier IN ('S','A')
ORDER BY ce.quality_composite DESC;
-- For upcoming fights where judges are announced, show judge tendencies
SELECT
f1.name AS fighter1,
f2.name AS fighter2,
j.name AS judge,
j.pct_favor_aggressor,
j.pct_favor_control,
j.pct_favor_damage,
j.pct_split,
j.controversial_rate
FROM pre_fight_judges pfj
JOIN fights fi ON pfj.fight_id = fi.fight_id
JOIN judges j ON pfj.judge_id = j.judge_id
JOIN fighters f1 ON fi.fighter1_id = f1.fighter_id
JOIN fighters f2 ON fi.fighter2_id = f2.fighter_id
JOIN events ev ON fi.event_id = ev.event_id
WHERE ev.event_date = (SELECT MIN(event_date) FROM events WHERE event_date >= date('now'))
ORDER BY fi.bout_order, j.name;
-- Output per round for a specific fighter (cardio curve)
SELECT
fighter_id,
round_number,
AVG(CASE WHEN round_duration_sec > 0
THEN sig_strikes_landed * 1.0 / (round_duration_sec / 60.0)
ELSE NULL END) AS avg_strikes_per_min,
COUNT(*) AS sample_size
FROM fighter_round_stats
WHERE fighter_id = ?
GROUP BY fighter_id, round_number
ORDER BY round_number;
-- Brier score per source over last 100 fights
SELECT
p.source,
COUNT(*) AS fights_tracked,
AVG(ce.brier_score) AS avg_brier,
SUM(CASE WHEN ce.prediction_correct = 1 THEN 1 ELSE 0 END) * 1.0 / COUNT(*) AS accuracy
FROM predictions p
JOIN calculated_edges ce ON p.fight_id = ce.fight_id AND p.fighter_id = ce.fighter_id
WHERE ce.prediction_correct IS NOT NULL
GROUP BY p.source
ORDER BY avg_brier ASC;
-- Alert if any data source hasn't been updated in >24 hours
SELECT
'ufcstats_events' AS source,
MAX(created_at) AS last_update,
ROUND((julianday('now') - julianday(MAX(created_at))) * 24, 1) AS hours_stale
FROM events
UNION ALL
SELECT
'betting_odds',
MAX(captured_at),
ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM betting_odds
UNION ALL
SELECT
'live_tracking',
MAX(captured_at),
ROUND((julianday('now') - julianday(MAX(captured_at))) * 24, 1)
FROM live_tracking
WHERE captured_at >= date('now', '-7 days');
PRAGMA journal_mode=WAL and PRAGMA foreign_keys=ONweight_classes seed datafighter_weight_history from historical weigh-in datafighter_rolling_stats ETL (compute after each fight ingestion)odds_line_movement pipeline (6h pre-fight week, 1h fight week, 15min post-weigh-in)predictions table with ensemble averaging logicmodel_avg_prob computation across all available sourcesstyle_matchup_matrix using weight class groups (lower/middle/upper)calculated_edges pipeline (no-vig, cross-book, model edge, sharp action indicators)fighter_injuries table with historical datajudges tablepre_fight_judges snapshot pipeline (trigger 48h before event)fight_referees pipeline (trigger when assigned)data_latency_ms column)live_tracking.dbSeverity: CRITICAL Sources: Opus, DeepSeek, Gemini, all panel models Fix:
Severity: CRITICAL Sources: Opus, Sonar, DeepSeek Fix: Launch with hand-tuned weights, but immediately set up logistic regression backtest on 2018-2024 data. Weights should be learned, not guessed. The 25/20/15/10/10/10/5/5 split is an educated starting point. Adjust before going live based on backtest results.
Severity: HIGH
Source: Opus
Fix: All rolling stats shrunk toward weight-class mean using adj_stat = (n * fighter_stat + k * class_mean) / (n + k) where k=5. Implemented in fighter_rolling_stats table as adj_* columns.
Severity: HIGH
Source: Sonar
Fix: Track data_latency_ms in live_tracking. Define max staleness per market (90s for ML, 5min for method/round). Implement fallback hierarchy: Kalshi > Polymarket > Pinnacle. Flag stale captures with is_stale = 1.
Severity: HIGH Source: Sonar Fix: Sanity checks on all scraped data:
control_time_sec ≤ round_duration_sec × num_roundssig_strikes_per_min ≤ 15 (reasonable upper bound)takedown_accuracy_pct ≤ 1.0reach_inches > 0 AND reach_inches < 90
Log violations to data_quality_log, quarantine flagged records.Severity: HIGH
Source: Opus
Fix: V1 uses categorical tags (acceptable). V2 replaces with continuous style vector stored in fighters table (columns already included in schema: style_striking_offense, style_wrestling_offense, etc.).
Severity: HIGH Source: Opus Fix: Do the historical backfill as a ONE-TIME operation with delays and rotating proxies. Once historical data is cached, losing access is survivable because Odds API covers current lines.
Severity: MEDIUM Source: Opus, Gemini, Sonar Fix: V1 uses the step function with correct division groupings (see Section 7.8). V2 upgrades to sigmoid decay curve per division.
Severity: MEDIUM Source: Opus Fix: Do NOT blend ELO as one of the ensemble sources while also using the composite formula. Either: use ELO/Glicko-2 as the model (replacing the composite), or use ELO only for calibration. For v1: the composite formula IS the model. ELO is a separate validation check.
Severity: MEDIUM Source: Opus Fix: V1 uses the step function (0/-15/-30/-50) with recency doubling. V2 upgrades to the decay function detailed in Section 7.5. Schema is ready for either approach.
| Item | Recommendation | Rationale |
|---|---|---|
| Live 1-minute tracking for ALL contract types | SIMPLIFY: ML at 1min, method/round at 5min | Cuts data volume 70%, captures 90% of signal |
| Geometric mean as alternative MoV approach | DEFER to v2: Build sample-size weighted first | Building two parallel systems doubles code debt |
| Per-weight-class composite formula tuning | DEFER to v2: Need 12+ months of live data per class | Too little per-class data in year 1 |
| 67+ statistical categories from UFCStats | SIMPLIFY: Start with top 20 most predictive | Feature importance analysis will show which add value |
| Bloody Elbow staff picks as ensemble source | REMOVE: Editorial turnover, poor calibration | BetMMA.tips + Tapology + internal model is sufficient |
| Compound booster stacking to -45 | ADD FLOOR: Composite cannot drop below 10 after boosters | Prevents extreme values that break prediction calibration |
| Judge bias with 14 bias vectors per judge | SIMPLIFY for v1: Track 4 key tendencies per judge, add more in v2 | Small edge (<3%); not worth full complexity at launch |
| Full cardio curves for all historical fights | SIMPLIFY: Compute on-demand via query, don't pre-store all curves | The query in Section 15 computes this efficiently |
| 5-7 prediction site scrapers simultaneously | PHASE: Build BetMMA + Tapology first, add FightMatrix + Action Network after baseline is working | Ensemble only improves a working model |
| Cardio archetype classification (Sustainer/Decliner/Cliff) | SIMPLIFY: Use raw output_decline_pct in formula directly |
Archetypes add interpretive complexity without clear gain over the raw metric |
These are worth building as shared services, not per-desk implementations:
| Component | Reusable For |
|---|---|
| Scraper infrastructure (rate limiting, Cloudflare handling, exponential backoff, health checks, local caching) | All desks — team sports, crypto, forex |
| Odds aggregation pipeline (Kalshi, Polymarket, Pinnacle, DraftKings standardization, closing line calculation) | All desks |
| Bayesian shrinkage for small-sample rolling stats | Tennis (individual sport, same problem), Boxing, Golf, any individual sport |
| ELO/Glicko-2 rating system | Tennis, Boxing, Chess/Esports, any 1v1 competition |
| Brier scoring + settlement pipeline | All desks — build as shared service |
| Line movement velocity detection + steam move alerts | NFL, NBA, MLB, NHL, Tennis, Boxing |
| Inverse-MSE ensemble weighting (better models get more weight) | All desks |
| Judge/referee bias tracking methodology | Boxing (judge bias even more impactful), NBA (referee tendencies), MLB (umpire strike zones) |
| RSS-based news/injury alert pipeline | All sports desks (different feeds, same infrastructure) |
| Backtest harness (predictions → outcomes → Brier score → edge tier ROI) | All desks |
| Feature engineering templates (percentile ranking, rolling averages, composite normalization) | All desks |
| Live tracking SLA architecture (prioritization, fallback hierarchy, staleness detection) | All desks doing live markets |
| Data quality log + sanity checks | All desks |
[PANEL: Sonar] — Three big moves:
Explicit data quality SLAs for every source: Commit to: UFCStats (check daily, alert if >24h stale), MMADecisions (manual validation on 5 random fights every Friday), judge panel (scraped <24h before fight). When data fails SLA, document it and adjust weights or skip that bet. Right now there is no accountability for data freshness. The data_quality_log table supports this, but you need a daily cron that checks it.
Pre-launch production simulation: Before building anything, run the entire pipeline on a historical fight card (e.g., 5 fights from Jan 2026). Scrape all sources, calculate all edges, generate a pre-event report, then grade against actual outcomes. This catches broken scrapers, API limits, and schema surprises before real money is on the line.
Explicit launch gate with pass/fail criteria: Add a hard requirement: "Launch when moneyline Brier score > 0.52 on last 50 backtested fights AND Method of Victory model backtest > 0.51." If Phase 10 is reached without these benchmarks, either the model doesn't work or it was shipped untested.
[PANEL: Opus] — Replace the composite formula with logistic regression as primary model: Keep the composite formula as a human-readable "explainability layer" but use a logistic regression model trained on the same features for actual edge calculation. A logistic regression on 2018-2024 fight outcomes using the same 8 feature groups will: (1) find the actual optimal weights, (2) reveal which features are predictive and which are noise, (3) output calibrated probabilities directly without needing to map composite scores to win probabilities, (4) be trivially updatable as new data comes in.
| Division | Upper Limit (lbs) | Gender |
|---|---|---|
| Strawweight | 115 | Female |
| Flyweight | 125 | Male / Female |
| Bantamweight | 135 | Male / Female |
| Featherweight | 145 | Male |
| Lightweight | 155 | Male |
| Welterweight | 170 | Male |
| Middleweight | 185 | Male |
| Light Heavyweight | 205 | Male |
| Heavyweight | 265 | Male |
Weight class groups for style matchup matrix:
| Method | Sub-Types | Betting Market |
|---|---|---|
| KO/TKO | Punch KO, head kick KO, body shot TKO, ground & pound TKO, doctor stoppage (strikes) | KO/TKO prop |
| Submission | RNC, guillotine, armbar, triangle, heel hook, D'Arce, anaconda, etc. | Submission prop |
| Decision - Unanimous | All 3 judges score same winner | Decision prop |
| Decision - Split | 2 of 3 judges score same winner | Decision prop |
| Decision - Majority | 2 judges score winner, 1 draws | Decision prop |
| Draw | Split draw, majority draw | N/A (void most bets) |
| No Contest | Accidental foul, failed drug test | N/A (void) |
| DQ | Illegal strikes, fouls | Rare / N/A |
| Tag | Key Statistical Indicators |
|---|---|
| Boxer | High SLpM, high distance %, low TD attempts |
| Pressure Fighter | Very high SLpM, high SApM, low striking defense |
| Counter Striker | Low SLpM, high accuracy %, high defense % |
| Wrestler | High TD attempts, high control time, low SLpM |
| Grappler/BJJ | High sub attempts, moderate TDs, finish via sub |
| Clinch Fighter | High clinch strikes %, high clinch control time |
| Kickboxer | High distance %, balanced head/body/leg strikes |
| Well-Rounded | Balanced across all categories |
fighters ──┬── fights (as fighter1 or fighter2)
├── fighter_fight_stats (per-fight stats, 2 rows/fight)
├── fighter_round_stats (per-round stats, 2×N rows/fight)
├── fighter_rolling_stats (career snapshot at each fight)
├── fighter_weight_history (weight class + weigh-in per fight)
├── fighter_injuries (injury + suspension history)
├── betting_odds (odds per fighter per fight per book)
├── odds_line_movement (timestamped line changes)
├── predictions (per source per fighter per fight)
└── calculated_edges (final edge calculations per fighter per fight)
events ──── fights (1:many)
fights ──┬── fight_results (1:1)
├── fighter_fight_stats (1:2)
├── fighter_round_stats (1:many)
├── betting_odds (1:many)
├── odds_line_movement (1:many)
├── predictions (1:many)
├── calculated_edges (1:2)
├── live_tracking (1:many)
├── bout_judges (1:3, for decisions)
├── pre_fight_judges (1:3, pre-event snapshot)
└── fight_referees (1:1 or 1:many during build)
weight_classes ──── fights (1:many)
fighter_rolling_stats (1:many)
fighter_weight_history (1:many)
style_matchup_matrix (optional FK)
judges ──── bout_judges (1:many)
judges ──── pre_fight_judges (1:many)
referees ──── fight_referees (1:many)
Total data points per fight: ~220-240 (fighter stats x2 + betting odds across books + line movement + predictions across sources + calculated edges + live tracking snapshots + judge + referee data).
| Service | Endpoint Pattern | Auth | Notes |
|---|---|---|---|
| The Odds API | https://api.the-odds-api.com/v4/sports/mma_mixed_martial_arts/odds |
API key header | 500 req/mo free |
| Kalshi REST | https://api.elections.kalshi.com/trade-api/v2/markets |
RSA-PSS signature | Series filter: MMA |
| Kalshi Candlestick | https://api.elections.kalshi.com/trade-api/v2/markets/{ticker}/candlesticks |
RSA-PSS signature | Historical closing prices |
| Polymarket | https://gamma-api.polymarket.com/markets |
None | No account needed |
| UFCStats.com | http://www.ufcstats.com/statistics/events/completed |
None | HTML scrape, rate-limit aggressively |
| Sherdog | https://www.sherdog.com/fighter/{id} |
None | HTML scrape |
| Tapology | https://www.tapology.com/fightcenter |
None | HTML scrape |
Applied to all rolling stats for fighters with <10 UFC fights:
adj_stat = (n * fighter_stat + k * class_mean) / (n + k)
where:
n = number of UFC fights (sample size)
k = shrinkage coefficient (how much we pull toward class mean)
class_mean = weight-class average for this stat
Recommended k values:
k = 5 for most stats (SLpM, SApM, accuracy, defense)
k = 3 for high-variance stats (KO rate, finish rate)
k = 8 for very stable stats (reach, height — these don't change)
At n=5: adj_stat is weighted 50/50 between fighter and class mean (k=5)
At n=10: adj_stat is weighted 67/33 toward fighter (k=5)
At n=20: adj_stat is weighted 80/20 toward fighter (k=5)
This is the standard Marcel projection approach used in baseball analytics, adapted for MMA's sample-size problem.
End of merged final spec — UFC/MMA Data Collection Desk
Status: NOT BUILT — add when this desk goes live for execution
Current state: All Kalshi data (prices, order books, trades) is fetched via REST API polling on cron schedules. This is fine for edge detection and monitoring, but NOT sufficient for live trade execution.
Why WebSocket matters:
What to build when ready:
Note: Sports markets use RFQ (Request for Quote) so the visible order book is usually empty — but the trade feed still matters for freshness signals and the WebSocket is required for order submission. Weather markets DO have real visible order books where this upgrade is even more critical.
Added 2026-03-29 — upgrade REST to WS when desk moves to live execution