Game Desk Build Template

Purpose: This document is a universal template for building a new game-level sports desk in EdgeClaw. An AI reading this should know exactly what to build, how to wire it up, and what questions to ask before starting.

Reference implementations: MLB Desk (docs/mlb-desk-construction-spec.md), MLB Player Props Desk (docs/mlb-player-props-desk-construction-spec.md)


The Pipeline Pattern

Every game desk follows the same 4-table pipeline:

Table 1: Kalshi Data (soft book — what we trade against)
    ↓
Table 2: Anchor Book Data (sharp book — our fair value reference)
    ↓
Table 3: Probability Curves (book curve + model curve vs Kalshi price)
    ↓
Table 4: Edge Scanner Output (mispricings found)

The rule: Each table is sport-specific. No mixing data across sports. Each desk gets its own isolated databases.


Table 1 — Kalshi Data

What: Raw contract prices pulled from the Kalshi API for this sport's game-level markets.

Database: kalshi-{sport}-prices.db Table name: kalshi_{sport}_prices

Standard columns:

Column Type Description
scan_type TEXT Which collection window: 6am, 8am, 10am, 2pm, 6pm, close
game_date TEXT YYYY-MM-DD
game_time TEXT HH:MM ET
market_type TEXT moneyline, spread, total, team_total, f5_spread, f5_total, etc.
game_id TEXT Kalshi event ID
away_team TEXT Away team name
home_team TEXT Home team name
threshold REAL The line (e.g., 7.5 for total, -1.5 for spread)
yes_bid INTEGER Cents (0-100)
yes_ask INTEGER Cents (0-100)
yes_exec REAL Midpoint of yes bid/ask
no_bid INTEGER Cents (0-100)
no_ask INTEGER Cents (0-100)
no_exec REAL Midpoint of no bid/ask
spread INTEGER Ask minus bid (liquidity indicator)
volume INTEGER Contracts traded
captured_at TEXT ISO timestamp

Data quality rules:

Kalshi API notes:


Table 2 — Anchor Book Data

What: Sharp book odds that serve as our fair value reference. This is the "truth" we compare Kalshi against.

For game desks, the anchor is typically Pinnacle (sharpest game book). Other options: opening lines from offshore books, consensus lines from SBR.

Database: pinnacle-{sport}.db (or {anchor}-{sport}.db) Table name: pinnacle_{sport}_odds

Standard columns:

Column Type Description
scan_type TEXT Collection window
game_date TEXT YYYY-MM-DD
game_time TEXT HH:MM ET
game_number INTEGER 1 or 2 (for doubleheaders)
market_type TEXT moneyline, spread, total
home_team TEXT
away_team TEXT
home_line REAL Home side line
away_line REAL Away side line
home_price REAL Decimal odds or American odds
away_price REAL
captured_at TEXT ISO timestamp

Key principle: The anchor book provides the de-vigged fair value. De-vig the over/under (or home/away) prices to get the true implied probability at the posted line. This probability anchors the curve.


Table 3 — Probability Curves

What: For each game, build TWO probability curves and compare both against every Kalshi alt-line price:

  1. Book curve — Derived from the anchor book's de-vigged fair value. The anchor book gives you the probability at ONE point (their posted line). You extend this into a full curve across all thresholds using a statistical distribution.

  2. Model curve — Derived from the desk's own model (power ratings, team quality composites, pitching matchups, whatever the sport uses). This is an independent estimate that doesn't rely on the books at all.

Both curves are compared against Kalshi's price at each threshold. When either curve says a Kalshi contract is mispriced by more than the fee (typically 7%), that's an edge.

Database: {sport}-edges.db Table name: {sport}_probability_curves

Standard columns:

Column Type Description
scan_type TEXT Collection window
game_date TEXT YYYY-MM-DD
game_time TEXT HH:MM ET
market_type TEXT spread, total, team_total, f5_spread, etc.
game_id TEXT
away_team TEXT
home_team TEXT
threshold REAL The alt-line value (e.g., 6.5, 7.5, 8.5 for totals)
book_anchor REAL Anchor book's line (the point calibrating the curve)
book_yes REAL Book curve probability of YES at this threshold
book_no REAL Book curve probability of NO at this threshold
model_yes REAL Model curve probability of YES at this threshold
model_no REAL Model curve probability of NO at this threshold
kalshi_yes REAL Kalshi exec price for YES
kalshi_no REAL Kalshi exec price for NO
rung INTEGER 0 = closest to 50/50 (main line), positive = further out
is_main_line INTEGER 1 if this threshold is the book's headline line (closest to 50/50), 0 otherwise
actual_result REAL Actual game result at this threshold (filled by settlement)
outcome TEXT "over"/"under" or "cover"/"miss" relative to threshold
book_was_right INTEGER 1 if anchor book favored the correct side, 0 if not
model_was_right INTEGER 1 if model favored the correct side, 0 if not
book_error REAL How far off the book probability was from reality
model_error REAL Same for model
settled_at TEXT Timestamp when result was recorded
captured_at TEXT ISO timestamp

Distribution selection (game-level):

Market type Distribution Why
Totals (runs/goals/points) Poisson Count data, appropriate for scoring events
Spreads (margin of victory) Normal Continuous-ish, symmetric around expected margin
Team totals Poisson Same as game totals but per-team
Moneyline Derived from spread curve P(margin > 0) from the Normal CDF
F5 / first half Same distribution, tighter sigma Less variance in partial game — scale sigma down

The model curve uses whatever the sport's desk model produces. For MLB this is quality composites, SP matchups, park factors. For NBA it might be pace-adjusted efficiency. The template can't prescribe this — it depends on the sport. See "Questions to Ask" below.


Table 4 — Edge Scanner Output

What: The final output — mispricings found by comparing Kalshi prices against the book curve and model curve.

Database: {sport}-edges.db (same DB as curves) Table name: sports_edges (filtered by sport column)

Standard columns:

Column Type Description
sport TEXT Sport identifier for filtering
scan_type TEXT Collection window
game_date TEXT YYYY-MM-DD
game_time TEXT HH:MM ET
market_type TEXT spread, total, moneyline, etc.
sub_market TEXT f5, team_total, etc. (nullable)
game_id TEXT
away_team TEXT
home_team TEXT
side TEXT yes / no / home / away / over / under
threshold REAL The line
anchor_prob REAL Book curve fair probability
model_prob REAL Model curve fair probability
kalshi_price REAL What Kalshi is offering
execution_price REAL What you'd actually pay (including spread)
raw_edge_book REAL anchor_prob - kalshi_price
raw_edge_model REAL model_prob - kalshi_price
net_edge REAL Edge after fee (7% Kalshi fee)
confidence TEXT HIGH / MEDIUM / LOW
executable BOOLEAN Is the spread tight enough to actually trade?
detected_at TEXT ISO timestamp
actual_outcome TEXT win / loss / push (filled after settlement)
settled_at TEXT When settled
closing_price REAL Kalshi price at close (for CLV tracking)
clv REAL Closing line value — did the line move our way?

Edge summary table: sports_edge_summary


Scheduling

Everything runs on Eastern Time (ET). The schedule has 3 layers:

Layer 1 — Data Collection (top of hour)

Minute What fires Cron pattern
:00 Kalshi game prices + any other soft book pulls 0 6,8,10,14,18 * * *
:02 Anchor book pull (Pinnacle) 2 6,8,10,14,18 * * *

Game-day windows: 6 AM, 8 AM, 10 AM, 2 PM, 6 PM ET Closing snapshot: 1 minute before each game's start time (staggered per game, not all at once)

Layer 2 — Stats & Baselines (daily morning)

Three staggered groups so raw data finishes before computation starts:

Group Time What runs
Group 1 — Raw Data 9:00 AM External API pulls: stats API, game logs, web scrapes, weather
Group 2 — Baselines 9:05 AM Light computation on Group 1 data: rolling averages, baselines, splits
Group 3 — Derived Metrics 9:10 AM Heavy math needing Groups 1+2: model composites, variance, fatigue, matchup adjustments

Layer 3 — Edge Detection (after data lands)

Minute What fires Cron pattern
:10 Probability curves rebuild (reads fresh Kalshi + anchor data) 10 6,8,10,14,18 * * *
:12 Edge scanner (reads fresh curves, finds mispricings) 12 6,8,10,14,18 * * *

Critical: Edge scanner must run AFTER curves are built, which must run AFTER data collection. The 10-minute offset ensures fresh data is available. Never run the edge scanner at :00 — the data won't be there yet.

Layer 4 — Futures & Season Markets (adaptive)

For long-term markets (futures, win totals, division winners):

Phase Frequency When to transition
Phase 1 Weekly (Monday) Start of season → ~2 months before end
Phase 2 Every 3 days ~2 months before end → ~1 month before end
Phase 3 Daily ~1 month before end → season end
Expired Stop scanning After season ends

Set sport-specific transition dates. Include an expiry_date column in futures databases — scanning stops after expiry.


Dashboard Wiring

desk-config.ts

Every desk needs an entry in src/pipeline/data-status/desk-config.ts:

{
  name: '{Sport Name}',
  slug: '{sport-slug}',
  kalshi: [{
    category: 'Sports / {Category} / {Sport}',
    hasScraper: true,
    scraperName: 'collector.ts (Kalshi sports cron)',
    freshnessKey: 'kalshi-{sport}',
    series: [
      { ticker: 'KXSERIES1', label: 'Human Label', dataViewKey: 'kalshi-{sport}-type1' },
      // one per Kalshi series for this sport
    ],
  }],
  sources: [
    // Sources organized by group — see Standard Groups below
  ],
}

Standard Groups for Game Desks

Every game desk should have these groups on the dashboard. Order matters — this is how they appear on the page:

  1. Edge Detection — Edge scanner results (per market type), probability curves, implied curves
  2. Game-Day — Kalshi game lines, anchor book odds, other book data, sharp money indicators
  3. Futures — Season-long markets (if applicable): win totals, division winners, championship futures
  4. Ratings & Predictions — External model predictions, power ratings, simulations
  5. Pitching / Offense / Defense (sport-specific) — Player/team stats relevant to modeling
  6. Game Context — Park/venue factors, weather, referee/umpire tendencies, schedule, injuries
  7. Computed Analytics — Model outputs: quality composites, variance, fatigue, derived metrics

source-tables.ts

Every freshnessKey in desk-config needs a mapping in src/pipeline/data-status/source-tables.ts:

'freshnessKey': {
  db: 'database-name',        // SQLite DB file (without .db extension)
  tables: ['table_name'],     // Which table(s) to query
  filter?: {                  // Optional: filter to specific rows
    column: 'column_name',
    value: 'filter_value',
  },
  parentKey?: 'parent-key',   // Optional: inherit freshness from parent
}

Naming convention for freshnessKeys:

scheduler.ts

Register cron jobs in src/cron/scheduler.ts. Pattern:

const task = cron.schedule('CRON_PATTERN', async () => {
  try {
    // 1. Import the scraper/calculator
    // 2. Run it
    // 3. Update freshness
    const { updateFreshnessAutoHash: uf } = await import('../pipeline/data-freshness.js');
    uf(getPipelineDb(), 'freshnessKey', 'fresh', rowCount);
  } catch (e) {
    logger.error(`Cron error: ${e instanceof Error ? e.message : e}`);
    uf(getPipelineDb(), 'freshnessKey', 'error', undefined, e.message);
  }
}, { timezone: 'America/New_York' });
this.tasks.set('task-name', task);

Freshness Tracking

Freshness is automatic — the dashboard discovers sources from desk-config and checks data_freshness table. Each cron job calls updateFreshnessAutoHash() after running. No separate registration needed.

Thresholds:

Source type Yellow (stale) Red (alert)
Game-day odds (6/8/10/2/6) 30 min after window 90 min after window
Daily stats (9 AM) 60 min after expected 180 min after expected
Futures (weekly/adaptive) 2 days after expected 7 days after expected
Edge scanner 30 min after window 90 min after window

Telegram alert fires after 2 consecutive misses.


Isolation Checklist

Before a desk is considered done, every item must pass:

# Item What to check
1 Databases isolated Every table in its own sport-specific .db file
2 Scraper queue independent This desk's scrapers don't depend on other desks
3 Recovery queue Scrapers are in the recovery queue (re-run on crash)
4 Freshness tracking Every source has a freshnessKey and shows on dashboard
5 Dashboard views Every freshnessKey resolves to a working data view
6 Edge scanner Reads from isolated DBs, writes to sport-specific edges DB
7 Scan windows Every row tagged with scan_type
8 Data cleanliness No live, no settled, same-day only for game-day data
9 Column formatting Human-readable tickers, dates, prices on dashboard views
10 Filters Column filters on all dashboard views
11 No cross-desk dependencies This desk doesn't break if another desk's scraper fails
12 Sport-specific edge cases Doubleheaders, postponements, overtime, whatever applies
13 Alerts Freshness alerts configured for this desk's sources

Things to Watch Out For

Hard-won lessons from building the MLB desk. Read these before starting any new desk.

  1. Player/team name mismatch across sources. Kalshi, Pinnacle, FanDuel, stats APIs — they all spell names differently. Kalshi uses tickers like "KXMLBHR-26APR07-A.BREGMAN-O1", FanDuel uses "Alex Bregman", stats APIs use "Bregman, Alex". You NEED a crosswalk table that maps names across sources. Build it early. Strip Jr./Sr. suffixes. Handle misspellings (Kalshi had "SUREZ" for Suarez).

  2. Wrong scanner wired to cron. When the MLB prop edge scanner was first deployed, the cron job was calling the generic scanner instead of the MLB-specific one. The edges table was empty for days before anyone noticed. Always verify the cron is calling the right function — test it manually after wiring.

  3. Filters too aggressive. The initial minimum price filter (5 cents) was dropping valid alt lines. Had to lower to 1 cent. Don't over-filter — you can always filter on the dashboard side. Be conservative with data-side filters.

  4. Kalshi API pagination and rate limits. The initial Kalshi pull only fetched 1 page of results and missed most of the data. Some sports have 5+ pages of contracts. Always check if there are more pages. Add rate limit delays (500ms+) between page requests. Kalshi returns 429 (too many requests) if you hit their API too fast — when scanning multiple series in a loop, use at least 2-second delays between calls. Don't retry 429s immediately, wait and try on the next cron cycle.

  5. Sign convention bleeding across market types. Spreads use negative signs (home -1.5), totals don't. A bug caused the total's model anchor to get a negative sign from the spread logic. Keep spread/total/moneyline code paths completely separate.

  6. Live and settled data leaking in. Without explicit checks, the scraper will collect prices for games that have already started (live) or already ended (settled). This corrupts your curves. Always check game start time and skip games that have started. Filter bid <= 5 / bid >= 95 for settled contracts.

  7. Edge scanner running before data lands. If the edge scanner fires at :00 (same time as data collection), it reads stale data from the previous window. The scanner MUST run after data collection finishes — at least :10 past the hour.

  8. Staggered scheduling is critical. Raw data (Group 1, 9:00 AM) must finish before baselines (Group 2, 9:05 AM), which must finish before derived metrics (Group 3, 9:10 AM). If you don't stagger, derived metrics compute on yesterday's baselines.


Questions to Ask Before Building

An AI starting a new game desk MUST ask and get answers for these before writing any code:

Sport-Specific

  1. What Kalshi ticker series exist for this sport? (e.g., KXMLB, KXNBA, KXNHL)
  2. What market types does Kalshi offer? (moneyline, spread, total, team total, F5/first half, period-specific, etc.)
  3. Does this sport have doubleheaders, split games, or other scheduling quirks?
  4. Are there overtime/extra innings/shootout rules that affect totals?
  5. What is the typical scoring range? (Affects distribution choice — low-scoring = Poisson, high-scoring = Normal)
  6. Does this sport have a first-half/first-period equivalent to MLB's F5?

Anchor Book

  1. Which book is the sharp anchor for this sport? (Pinnacle for most, but could vary)
  2. Does the anchor book offer all the same market types as Kalshi? (Pinnacle doesn't offer MLB team totals — they have to be derived)
  3. Are there any scraping issues with the anchor? (Pinnacle F5 lines were unreliable for MLB — had to derive from full-game)

Model

  1. What model does this sport's desk use? (Power ratings, Elo, quality composites, etc.)
  2. What data feeds the model? (Stats API, external ratings, player logs, etc.)
  3. What distribution should each market type use for the model curve?
  4. Are there sport-specific adjustments? (Park factors for MLB, altitude for NBA, ice conditions for NHL, etc.)

Futures

  1. Does Kalshi offer futures for this sport? (Championship, division, win totals, etc.)
  2. What is the season timeline? (Start date, key transition dates, end date — for adaptive scheduling)
  3. Does the anchor book offer futures for cross-reference?

Data Sources

  1. What external stats APIs exist for this sport? (MLB Stats API, NBA Stats API, etc.)
  2. What web scrapes are needed? (Baseball Reference, Basketball Reference, etc.)
  3. What computed metrics does the model need? (Fatigue, variance, situational splits, etc.)

Dashboard

  1. What groups should appear on the dashboard? (Use standard groups as starting point, add sport-specific ones)
  2. Any special sub-views needed? (Per-market-type filtered views, per-team views, etc.)

Document Control

This template is updated as new desks are built and new lessons are learned. The "Things to Watch Out For" section grows with every desk.

Last updated: 2026-04-08


Changelog

Running log of changes and decisions for the game-level desk pipeline. Each entry has two sections:

If you're building this from scratch, read every Boss Notes section first — that's how the boss thinks about this system.


2026-04-08 — Initial changelog added

Changes:

No game desk bugs fixed this session. Changelog format established to match player props desk template.

Boss Notes:


2026-04-11 — Virtual scroll data view, directional rungs, anchor_diff column

Changes:

Boss Notes:


2026-04-10 — Summary bar fixes, model_best_side, result columns, F5 tie tracking

Changes:

Boss Notes:


2026-04-08 — Settlement columns added to Table 3 standard schema

Changes:

Added settlement and tracking columns to the probability curves standard schema (Table 3): is_main_line, actual_result, outcome, book_was_right, model_was_right, book_error, model_error, settled_at. These match the player props desk implementation. No game desk code changed — this is a template update so future game desks include these columns from the start.

Boss Notes:

Source: ~/edgeclaw/docs/desk-template-game.md