Game Desk Build Template

Purpose: This document is a universal template for building a new game-level sports desk in EdgeClaw. An AI reading this should know exactly what to build, how to wire it up, and what questions to ask before starting.

Reference implementations: MLB Desk (docs/mlb-desk-construction-spec.md), MLB Player Props Desk (docs/mlb-player-props-desk-construction-spec.md)

The Pipeline Pattern

Every game desk follows the same 4-table pipeline:

Table 1: Kalshi Data (soft book — what we trade against)
    ↓
Table 2: Anchor Book Data (sharp book — our fair value reference)
    ↓
Table 3: Probability Curves (book curve + model curve vs Kalshi price)
    ↓
Table 4: Edge Scanner Output (mispricings found)

The rule: Each table is sport-specific. No mixing data across sports. Each desk gets its own isolated databases.

Table 1 — Kalshi Data

What: Raw contract prices pulled from the Kalshi API for this sport's game-level markets.

Database: kalshi-{sport}-prices.db Table name: kalshi_{sport}_prices

Standard columns:

Column	Type	Description
scan_type	TEXT	Which collection window: 6am, 8am, 10am, 2pm, 6pm, close
game_date	TEXT	YYYY-MM-DD
game_time	TEXT	HH:MM ET
market_type	TEXT	moneyline, spread, total, team_total, f5_spread, f5_total, etc.
game_id	TEXT	Kalshi event ID
away_team	TEXT	Away team name
home_team	TEXT	Home team name
threshold	REAL	The line (e.g., 7.5 for total, -1.5 for spread)
yes_bid	INTEGER	Cents (0-100)
yes_ask	INTEGER	Cents (0-100)
yes_exec	REAL	Midpoint of yes bid/ask
no_bid	INTEGER	Cents (0-100)
no_ask	INTEGER	Cents (0-100)
no_exec	REAL	Midpoint of no bid/ask
spread	INTEGER	Ask minus bid (liquidity indicator)
volume	INTEGER	Contracts traded
captured_at	TEXT	ISO timestamp

Data quality rules:

Same-day only — only store prices for today's games
No settled prices — filter out bid <= 5 or bid >= 95
No live/in-game data — skip games that have already started
Tag every row with scan_type so you know which collection window it came from

Kalshi API notes:

REST base: https://api.elections.kalshi.com/trade-api/v2
Auth: RSA-PSS signing
Ticker format: SERIES-DATECODETEAMS-DETAIL
Paginate properly — some sports need multiple pages (5+ for MLB props). Always check if there are more pages.
Rate limit: 500ms-2s delay between API calls to avoid 429 errors

Table 2 — Anchor Book Data

What: Sharp book odds that serve as our fair value reference. This is the "truth" we compare Kalshi against.

For game desks, the anchor is typically Pinnacle (sharpest game book). Other options: opening lines from offshore books, consensus lines from SBR.

Database: pinnacle-{sport}.db (or {anchor}-{sport}.db) Table name: pinnacle_{sport}_odds

Standard columns:

Column	Type	Description
scan_type	TEXT	Collection window
game_date	TEXT	YYYY-MM-DD
game_time	TEXT	HH:MM ET
game_number	INTEGER	1 or 2 (for doubleheaders)
market_type	TEXT	moneyline, spread, total
home_team	TEXT
away_team	TEXT
home_line	REAL	Home side line
away_line	REAL	Away side line
home_price	REAL	Decimal odds or American odds
away_price	REAL
captured_at	TEXT	ISO timestamp

Key principle: The anchor book provides the de-vigged fair value. De-vig the over/under (or home/away) prices to get the true implied probability at the posted line. This probability anchors the curve.

Table 3 — Probability Curves

What: For each game, build TWO probability curves and compare both against every Kalshi alt-line price:

Book curve — Derived from the anchor book's de-vigged fair value. The anchor book gives you the probability at ONE point (their posted line). You extend this into a full curve across all thresholds using a statistical distribution.
Model curve — Derived from the desk's own model (power ratings, team quality composites, pitching matchups, whatever the sport uses). This is an independent estimate that doesn't rely on the books at all.

Both curves are compared against Kalshi's price at each threshold. When either curve says a Kalshi contract is mispriced by more than the fee (typically 7%), that's an edge.

Database: {sport}-edges.db Table name: {sport}_probability_curves

Standard columns:

Column	Type	Description
scan_type	TEXT	Collection window
game_date	TEXT	YYYY-MM-DD
game_time	TEXT	HH:MM ET
market_type	TEXT	spread, total, team_total, f5_spread, etc.
game_id	TEXT
away_team	TEXT
home_team	TEXT
threshold	REAL	The alt-line value (e.g., 6.5, 7.5, 8.5 for totals)
book_anchor	REAL	Anchor book's line (the point calibrating the curve)
book_yes	REAL	Book curve probability of YES at this threshold
book_no	REAL	Book curve probability of NO at this threshold
model_yes	REAL	Model curve probability of YES at this threshold
model_no	REAL	Model curve probability of NO at this threshold
kalshi_yes	REAL	Kalshi exec price for YES
kalshi_no	REAL	Kalshi exec price for NO
rung	INTEGER	0 = closest to 50/50 (main line), positive = further out
is_main_line	INTEGER	1 if this threshold is the book's headline line (closest to 50/50), 0 otherwise
actual_result	REAL	Actual game result at this threshold (filled by settlement)
outcome	TEXT	"over"/"under" or "cover"/"miss" relative to threshold
book_was_right	INTEGER	1 if anchor book favored the correct side, 0 if not
model_was_right	INTEGER	1 if model favored the correct side, 0 if not
book_error	REAL	How far off the book probability was from reality
model_error	REAL	Same for model
settled_at	TEXT	Timestamp when result was recorded
captured_at	TEXT	ISO timestamp

Distribution selection (game-level):

Market type	Distribution	Why
Totals (runs/goals/points)	Poisson	Count data, appropriate for scoring events
Spreads (margin of victory)	Normal	Continuous-ish, symmetric around expected margin
Team totals	Poisson	Same as game totals but per-team
Moneyline	Derived from spread curve	P(margin > 0) from the Normal CDF
F5 / first half	Same distribution, tighter sigma	Less variance in partial game — scale sigma down

The model curve uses whatever the sport's desk model produces. For MLB this is quality composites, SP matchups, park factors. For NBA it might be pace-adjusted efficiency. The template can't prescribe this — it depends on the sport. See "Questions to Ask" below.

Table 4 — Edge Scanner Output

What: The final output — mispricings found by comparing Kalshi prices against the book curve and model curve.

Database: {sport}-edges.db (same DB as curves) Table name: sports_edges (filtered by sport column)

Standard columns:

Column	Type	Description
sport	TEXT	Sport identifier for filtering
scan_type	TEXT	Collection window
game_date	TEXT	YYYY-MM-DD
game_time	TEXT	HH:MM ET
market_type	TEXT	spread, total, moneyline, etc.
sub_market	TEXT	f5, team_total, etc. (nullable)
game_id	TEXT
away_team	TEXT
home_team	TEXT
side	TEXT	yes / no / home / away / over / under
threshold	REAL	The line
anchor_prob	REAL	Book curve fair probability
model_prob	REAL	Model curve fair probability
kalshi_price	REAL	What Kalshi is offering
execution_price	REAL	What you'd actually pay (including spread)
raw_edge_book	REAL	anchor_prob - kalshi_price
raw_edge_model	REAL	model_prob - kalshi_price
net_edge	REAL	Edge after fee (7% Kalshi fee)
confidence	TEXT	HIGH / MEDIUM / LOW
executable	BOOLEAN	Is the spread tight enough to actually trade?
detected_at	TEXT	ISO timestamp
actual_outcome	TEXT	win / loss / push (filled after settlement)
settled_at	TEXT	When settled
closing_price	REAL	Kalshi price at close (for CLV tracking)
clv	REAL	Closing line value — did the line move our way?

Edge summary table: sports_edge_summary

Per-day, per-market-type edge counts and averages
Used for dashboard overview and performance tracking

Scheduling

Everything runs on Eastern Time (ET). The schedule has 3 layers:

Layer 1 — Data Collection (top of hour)

Minute	What fires	Cron pattern
:00	Kalshi game prices + any other soft book pulls	`0 6,8,10,14,18 * * *`
:02	Anchor book pull (Pinnacle)	`2 6,8,10,14,18 * * *`

Game-day windows: 6 AM, 8 AM, 10 AM, 2 PM, 6 PM ET Closing snapshot: 1 minute before each game's start time (staggered per game, not all at once)

Layer 2 — Stats & Baselines (daily morning)

Three staggered groups so raw data finishes before computation starts:

Group	Time	What runs
Group 1 — Raw Data	9:00 AM	External API pulls: stats API, game logs, web scrapes, weather
Group 2 — Baselines	9:05 AM	Light computation on Group 1 data: rolling averages, baselines, splits
Group 3 — Derived Metrics	9:10 AM	Heavy math needing Groups 1+2: model composites, variance, fatigue, matchup adjustments

Layer 3 — Edge Detection (after data lands)

Minute	What fires	Cron pattern
:10	Probability curves rebuild (reads fresh Kalshi + anchor data)	`10 6,8,10,14,18 * * *`
:12	Edge scanner (reads fresh curves, finds mispricings)	`12 6,8,10,14,18 * * *`

Critical: Edge scanner must run AFTER curves are built, which must run AFTER data collection. The 10-minute offset ensures fresh data is available. Never run the edge scanner at :00 — the data won't be there yet.

Layer 4 — Futures & Season Markets (adaptive)

For long-term markets (futures, win totals, division winners):

Phase	Frequency	When to transition
Phase 1	Weekly (Monday)	Start of season → ~2 months before end
Phase 2	Every 3 days	~2 months before end → ~1 month before end
Phase 3	Daily	~1 month before end → season end
Expired	Stop scanning	After season ends

Set sport-specific transition dates. Include an expiry_date column in futures databases — scanning stops after expiry.

Dashboard Wiring

desk-config.ts

Every desk needs an entry in src/pipeline/data-status/desk-config.ts:

{
  name: '{Sport Name}',
  slug: '{sport-slug}',
  kalshi: [{
    category: 'Sports / {Category} / {Sport}',
    hasScraper: true,
    scraperName: 'collector.ts (Kalshi sports cron)',
    freshnessKey: 'kalshi-{sport}',
    series: [
      { ticker: 'KXSERIES1', label: 'Human Label', dataViewKey: 'kalshi-{sport}-type1' },
      // one per Kalshi series for this sport
    ],
  }],
  sources: [
    // Sources organized by group — see Standard Groups below
  ],
}

Standard Groups for Game Desks

Every game desk should have these groups on the dashboard. Order matters — this is how they appear on the page:

Edge Detection — Edge scanner results (per market type), probability curves, implied curves
Game-Day — Kalshi game lines, anchor book odds, other book data, sharp money indicators
Futures — Season-long markets (if applicable): win totals, division winners, championship futures
Ratings & Predictions — External model predictions, power ratings, simulations
Pitching / Offense / Defense (sport-specific) — Player/team stats relevant to modeling
Game Context — Park/venue factors, weather, referee/umpire tendencies, schedule, injuries
Computed Analytics — Model outputs: quality composites, variance, fatigue, derived metrics

source-tables.ts

Every freshnessKey in desk-config needs a mapping in src/pipeline/data-status/source-tables.ts:

'freshnessKey': {
  db: 'database-name',        // SQLite DB file (without .db extension)
  tables: ['table_name'],     // Which table(s) to query
  filter?: {                  // Optional: filter to specific rows
    column: 'column_name',
    value: 'filter_value',
  },
  parentKey?: 'parent-key',   // Optional: inherit freshness from parent
}

Naming convention for freshnessKeys:

kalshi-{sport} — top-level Kalshi data
kalshi-{sport}-{market} — filtered by market type (spreads, totals, etc.)
pinnacle-{sport} — anchor book data
edge-scanner-{sport} — all edges
edge-scanner-{sport}-{market} — edges filtered by market type
{sport}-{metric}-curves — probability curves
{sport}-team-variance, {sport}-fatigue-metrics — computed analytics
scrape-{source}-{sport} — external data scrapes

scheduler.ts

const task = cron.schedule('CRON_PATTERN', async () => {
  try {
    // 1. Import the scraper/calculator
    // 2. Run it
    // 3. Update freshness
    const { updateFreshnessAutoHash: uf } = await import('../pipeline/data-freshness.js');
    uf(getPipelineDb(), 'freshnessKey', 'fresh', rowCount);
  } catch (e) {
    logger.error(`Cron error: ${e instanceof Error ? e.message : e}`);
    uf(getPipelineDb(), 'freshnessKey', 'error', undefined, e.message);
  }
}, { timezone: 'America/New_York' });
this.tasks.set('task-name', task);

Freshness Tracking

Freshness is automatic — the dashboard discovers sources from desk-config and checks data_freshness table. Each cron job calls updateFreshnessAutoHash() after running. No separate registration needed.

Thresholds:

Source type	Yellow (stale)	Red (alert)
Game-day odds (6/8/10/2/6)	30 min after window	90 min after window
Daily stats (9 AM)	60 min after expected	180 min after expected
Futures (weekly/adaptive)	2 days after expected	7 days after expected
Edge scanner	30 min after window	90 min after window

Telegram alert fires after 2 consecutive misses.

Isolation Checklist

Before a desk is considered done, every item must pass:

#	Item	What to check
1	Databases isolated	Every table in its own sport-specific .db file
2	Scraper queue independent	This desk's scrapers don't depend on other desks
3	Recovery queue	Scrapers are in the recovery queue (re-run on crash)
4	Freshness tracking	Every source has a freshnessKey and shows on dashboard
5	Dashboard views	Every freshnessKey resolves to a working data view
6	Edge scanner	Reads from isolated DBs, writes to sport-specific edges DB
7	Scan windows	Every row tagged with scan_type
8	Data cleanliness	No live, no settled, same-day only for game-day data
9	Column formatting	Human-readable tickers, dates, prices on dashboard views
10	Filters	Column filters on all dashboard views
11	No cross-desk dependencies	This desk doesn't break if another desk's scraper fails
12	Sport-specific edge cases	Doubleheaders, postponements, overtime, whatever applies
13	Alerts	Freshness alerts configured for this desk's sources

Things to Watch Out For

Hard-won lessons from building the MLB desk. Read these before starting any new desk.

Player/team name mismatch across sources. Kalshi, Pinnacle, FanDuel, stats APIs — they all spell names differently. Kalshi uses tickers like "KXMLBHR-26APR07-A.BREGMAN-O1", FanDuel uses "Alex Bregman", stats APIs use "Bregman, Alex". You NEED a crosswalk table that maps names across sources. Build it early. Strip Jr./Sr. suffixes. Handle misspellings (Kalshi had "SUREZ" for Suarez).
Wrong scanner wired to cron. When the MLB prop edge scanner was first deployed, the cron job was calling the generic scanner instead of the MLB-specific one. The edges table was empty for days before anyone noticed. Always verify the cron is calling the right function — test it manually after wiring.
Filters too aggressive. The initial minimum price filter (5 cents) was dropping valid alt lines. Had to lower to 1 cent. Don't over-filter — you can always filter on the dashboard side. Be conservative with data-side filters.
Kalshi API pagination and rate limits. The initial Kalshi pull only fetched 1 page of results and missed most of the data. Some sports have 5+ pages of contracts. Always check if there are more pages. Add rate limit delays (500ms+) between page requests. Kalshi returns 429 (too many requests) if you hit their API too fast — when scanning multiple series in a loop, use at least 2-second delays between calls. Don't retry 429s immediately, wait and try on the next cron cycle.
Sign convention bleeding across market types. Spreads use negative signs (home -1.5), totals don't. A bug caused the total's model anchor to get a negative sign from the spread logic. Keep spread/total/moneyline code paths completely separate.
Live and settled data leaking in. Without explicit checks, the scraper will collect prices for games that have already started (live) or already ended (settled). This corrupts your curves. Always check game start time and skip games that have started. Filter bid <= 5 / bid >= 95 for settled contracts.
Edge scanner running before data lands. If the edge scanner fires at :00 (same time as data collection), it reads stale data from the previous window. The scanner MUST run after data collection finishes — at least :10 past the hour.
Staggered scheduling is critical. Raw data (Group 1, 9:00 AM) must finish before baselines (Group 2, 9:05 AM), which must finish before derived metrics (Group 3, 9:10 AM). If you don't stagger, derived metrics compute on yesterday's baselines.

Questions to Ask Before Building

An AI starting a new game desk MUST ask and get answers for these before writing any code:

Sport-Specific

What Kalshi ticker series exist for this sport? (e.g., KXMLB, KXNBA, KXNHL)
What market types does Kalshi offer? (moneyline, spread, total, team total, F5/first half, period-specific, etc.)
Does this sport have doubleheaders, split games, or other scheduling quirks?
Are there overtime/extra innings/shootout rules that affect totals?
What is the typical scoring range? (Affects distribution choice — low-scoring = Poisson, high-scoring = Normal)
Does this sport have a first-half/first-period equivalent to MLB's F5?

Anchor Book

Which book is the sharp anchor for this sport? (Pinnacle for most, but could vary)
Does the anchor book offer all the same market types as Kalshi? (Pinnacle doesn't offer MLB team totals — they have to be derived)
Are there any scraping issues with the anchor? (Pinnacle F5 lines were unreliable for MLB — had to derive from full-game)

Model

What model does this sport's desk use? (Power ratings, Elo, quality composites, etc.)
What data feeds the model? (Stats API, external ratings, player logs, etc.)
What distribution should each market type use for the model curve?
Are there sport-specific adjustments? (Park factors for MLB, altitude for NBA, ice conditions for NHL, etc.)

Futures

Does Kalshi offer futures for this sport? (Championship, division, win totals, etc.)
What is the season timeline? (Start date, key transition dates, end date — for adaptive scheduling)
Does the anchor book offer futures for cross-reference?

Data Sources

What external stats APIs exist for this sport? (MLB Stats API, NBA Stats API, etc.)
What web scrapes are needed? (Baseball Reference, Basketball Reference, etc.)
What computed metrics does the model need? (Fatigue, variance, situational splits, etc.)

Dashboard

What groups should appear on the dashboard? (Use standard groups as starting point, add sport-specific ones)
Any special sub-views needed? (Per-market-type filtered views, per-team views, etc.)

Document Control

This template is updated as new desks are built and new lessons are learned. The "Things to Watch Out For" section grows with every desk.

Last updated: 2026-04-08

Changelog

Running log of changes and decisions for the game-level desk pipeline. Each entry has two sections:

Changes — what was done technically (bugs, fixes, new features, files touched)
Boss Notes — what the boss wants, likes, doesn't like, and why he made certain decisions. This is the vision for the system. Follow it.

If you're building this from scratch, read every Boss Notes section first — that's how the boss thinks about this system.

2026-04-08 — Initial changelog added

Changes:

No game desk bugs fixed this session. Changelog format established to match player props desk template.

Boss Notes:

Reviews data quality by eyeballing dashboard columns at /data-status/view/. If something looks off (all zeros, weird names, missing data), he flags it. The dashboard is his primary QA tool.
Wants these changelogs maintained so a future builder can replicate everything in one pass without asking questions.
Prefers keeping raw data and calculated output in separate databases — don't merge pipeline stages. Raw ingredients and finished product stay apart.
Catches issues by browsing data viewer pages directly, not by reading logs or code.

2026-04-11 — Virtual scroll data view, directional rungs, anchor_diff column

Changes:

Directional rung calculation — Rung 0 is now the threshold whose Pinnacle implied odds are closest to 50/50. Positive rungs = YES price up (over easier), negative = NO price up (under easier). Step size auto-detected. Both research-pipeline.db and mlb-edges.db recalculated. (edge-scanner.ts)
anchor_diff column — New computed column showing pinnacle_anchor - model_anchor, positioned between Pin W/L and Model Anchor. (data-view.ts, table-config.ts)
Virtual scroll data view — New page at /data-status/v/:source renders only ~50 DOM rows regardless of dataset size. Server APIs: /data (paginated JSON, 44ms), /summary (SQL aggregates, 26ms), /filter-values (14ms). Scales to 100K+ rows. (virtual-view.ts, data-api.ts, routes.ts)
Server-side filter/sort — DB column filters via SQL WHERE, computed column filters client-side. Sort via SQL ORDER BY. Summary recalculates from filtered data. (data-api.ts, virtual-view.ts)
Display formatting — Probabilities and prices show as percentages. Edges show with % and red/green coloring. Column widths auto-sized from content. (virtual-view.ts)

Boss Notes:

Boss caught that rung was not directional — just 0,1,2,3 with no negatives. Wanted it to show which way the line moved: positive = YES price up, negative = NO price up.
Boss specified: rung 0 = closest to 50/50 implied odds from Pinnacle, not just closest to the anchor number.
Boss wanted anchor_diff between Pin W/L and Model Anchor columns specifically.
Boss reported page was sluggish with ~2K rows. Council (5 AI advisors, unanimous) recommended virtual scrolling. Boss chose custom vanilla JS over Tabulator library.
Boss required summary stats to use ALL filtered data, not just displayed rows. Confirmed by testing model_best_side=no + result=NO filter → 100% win rate.

2026-04-10 — Summary bar fixes, model_best_side, result columns, F5 tie tracking

Changes:

Summary bar bug fixes — Our Model edge was 100x inflated (missing /100). Win% was using Pinnacle's outcome instead of checking model's own best side vs result. Both models now use same logic: compare best_side to Result column. (data-view.ts)
New model_best_side column — Each model now has its own best side column. best_side renamed to "Pin Best Side", new model_best_side = "Model Best Side" based on model edges. Backfilled 3291+2419 rows. (edge-scanner.ts, table-config.ts, data-view.ts)
Result columns for spreads and ML — Spreads get final_margin + result (team covered? YES/NO). ML gets result (team won? YES/NO). Totals already had this. (data-view.ts)
View-specific column hiding — Totals/spreads/ML views hide actual_outcome and settled_result (replaced by computed Result). Other views keep them. (data-view.ts)
F5 MLB tie tracking — Edge scanner now records TIE contracts using Poisson tie probability from mlb_probability_curves. New is_tie column flags games that tied after 5 innings. Settlement detects TIE ticker settling YES and flags all team contracts for that game. (edge-scanner.ts, scanner-bridge.ts)

Boss Notes:

Boss caught that Pinnacle and Our Model summary stats were using the same win/loss determination — should be independent. Each model's best side vs the actual Result column.
Boss identified that best_side was only for Pinnacle, not the edge model. Needed two separate columns.
Boss wants F5 tie data recorded so the model can learn tie probabilities. Kalshi has 3 contracts per F5 ML game (Team A, Team B, TIE) — we were only scanning 2. Now scanning all 3.
F5 results can't use final game scores (no inning-by-inning data). Kalshi settlement is source of truth for F5.

2026-04-08 — Settlement columns added to Table 3 standard schema

Changes:

Added settlement and tracking columns to the probability curves standard schema (Table 3): is_main_line, actual_result, outcome, book_was_right, model_was_right, book_error, model_error, settled_at. These match the player props desk implementation. No game desk code changed — this is a template update so future game desks include these columns from the start.

Boss Notes:

Probability curves are for beating the books — settlement proves whether the model is actually sharper than the book over time. Every desk needs this.
Same design as player props: is_main_line flag (not a value on every row), settlement runs as games finish (not morning batch), both book and model accuracy tracked.

Source: ~/edgeclaw/docs/desk-template-game.md