Purpose: This document is a universal template for building a new game-level sports desk in EdgeClaw. An AI reading this should know exactly what to build, how to wire it up, and what questions to ask before starting.
Reference implementations: MLB Desk (docs/mlb-desk-construction-spec.md), MLB Player Props Desk (docs/mlb-player-props-desk-construction-spec.md)
Every game desk follows the same 4-table pipeline:
Table 1: Kalshi Data (soft book — what we trade against)
↓
Table 2: Anchor Book Data (sharp book — our fair value reference)
↓
Table 3: Probability Curves (book curve + model curve vs Kalshi price)
↓
Table 4: Edge Scanner Output (mispricings found)
The rule: Each table is sport-specific. No mixing data across sports. Each desk gets its own isolated databases.
What: Raw contract prices pulled from the Kalshi API for this sport's game-level markets.
Database: kalshi-{sport}-prices.db
Table name: kalshi_{sport}_prices
Standard columns:
| Column | Type | Description |
|---|---|---|
| scan_type | TEXT | Which collection window: 6am, 8am, 10am, 2pm, 6pm, close |
| game_date | TEXT | YYYY-MM-DD |
| game_time | TEXT | HH:MM ET |
| market_type | TEXT | moneyline, spread, total, team_total, f5_spread, f5_total, etc. |
| game_id | TEXT | Kalshi event ID |
| away_team | TEXT | Away team name |
| home_team | TEXT | Home team name |
| threshold | REAL | The line (e.g., 7.5 for total, -1.5 for spread) |
| yes_bid | INTEGER | Cents (0-100) |
| yes_ask | INTEGER | Cents (0-100) |
| yes_exec | REAL | Midpoint of yes bid/ask |
| no_bid | INTEGER | Cents (0-100) |
| no_ask | INTEGER | Cents (0-100) |
| no_exec | REAL | Midpoint of no bid/ask |
| spread | INTEGER | Ask minus bid (liquidity indicator) |
| volume | INTEGER | Contracts traded |
| captured_at | TEXT | ISO timestamp |
Data quality rules:
Kalshi API notes:
https://api.elections.kalshi.com/trade-api/v2SERIES-DATECODETEAMS-DETAILWhat: Sharp book odds that serve as our fair value reference. This is the "truth" we compare Kalshi against.
For game desks, the anchor is typically Pinnacle (sharpest game book). Other options: opening lines from offshore books, consensus lines from SBR.
Database: pinnacle-{sport}.db (or {anchor}-{sport}.db)
Table name: pinnacle_{sport}_odds
Standard columns:
| Column | Type | Description |
|---|---|---|
| scan_type | TEXT | Collection window |
| game_date | TEXT | YYYY-MM-DD |
| game_time | TEXT | HH:MM ET |
| game_number | INTEGER | 1 or 2 (for doubleheaders) |
| market_type | TEXT | moneyline, spread, total |
| home_team | TEXT | |
| away_team | TEXT | |
| home_line | REAL | Home side line |
| away_line | REAL | Away side line |
| home_price | REAL | Decimal odds or American odds |
| away_price | REAL | |
| captured_at | TEXT | ISO timestamp |
Key principle: The anchor book provides the de-vigged fair value. De-vig the over/under (or home/away) prices to get the true implied probability at the posted line. This probability anchors the curve.
What: For each game, build TWO probability curves and compare both against every Kalshi alt-line price:
Book curve — Derived from the anchor book's de-vigged fair value. The anchor book gives you the probability at ONE point (their posted line). You extend this into a full curve across all thresholds using a statistical distribution.
Model curve — Derived from the desk's own model (power ratings, team quality composites, pitching matchups, whatever the sport uses). This is an independent estimate that doesn't rely on the books at all.
Both curves are compared against Kalshi's price at each threshold. When either curve says a Kalshi contract is mispriced by more than the fee (typically 7%), that's an edge.
Database: {sport}-edges.db
Table name: {sport}_probability_curves
Standard columns:
| Column | Type | Description |
|---|---|---|
| scan_type | TEXT | Collection window |
| game_date | TEXT | YYYY-MM-DD |
| game_time | TEXT | HH:MM ET |
| market_type | TEXT | spread, total, team_total, f5_spread, etc. |
| game_id | TEXT | |
| away_team | TEXT | |
| home_team | TEXT | |
| threshold | REAL | The alt-line value (e.g., 6.5, 7.5, 8.5 for totals) |
| book_anchor | REAL | Anchor book's line (the point calibrating the curve) |
| book_yes | REAL | Book curve probability of YES at this threshold |
| book_no | REAL | Book curve probability of NO at this threshold |
| model_yes | REAL | Model curve probability of YES at this threshold |
| model_no | REAL | Model curve probability of NO at this threshold |
| kalshi_yes | REAL | Kalshi exec price for YES |
| kalshi_no | REAL | Kalshi exec price for NO |
| rung | INTEGER | 0 = closest to 50/50 (main line), positive = further out |
| is_main_line | INTEGER | 1 if this threshold is the book's headline line (closest to 50/50), 0 otherwise |
| actual_result | REAL | Actual game result at this threshold (filled by settlement) |
| outcome | TEXT | "over"/"under" or "cover"/"miss" relative to threshold |
| book_was_right | INTEGER | 1 if anchor book favored the correct side, 0 if not |
| model_was_right | INTEGER | 1 if model favored the correct side, 0 if not |
| book_error | REAL | How far off the book probability was from reality |
| model_error | REAL | Same for model |
| settled_at | TEXT | Timestamp when result was recorded |
| captured_at | TEXT | ISO timestamp |
Distribution selection (game-level):
| Market type | Distribution | Why |
|---|---|---|
| Totals (runs/goals/points) | Poisson | Count data, appropriate for scoring events |
| Spreads (margin of victory) | Normal | Continuous-ish, symmetric around expected margin |
| Team totals | Poisson | Same as game totals but per-team |
| Moneyline | Derived from spread curve | P(margin > 0) from the Normal CDF |
| F5 / first half | Same distribution, tighter sigma | Less variance in partial game — scale sigma down |
The model curve uses whatever the sport's desk model produces. For MLB this is quality composites, SP matchups, park factors. For NBA it might be pace-adjusted efficiency. The template can't prescribe this — it depends on the sport. See "Questions to Ask" below.
What: The final output — mispricings found by comparing Kalshi prices against the book curve and model curve.
Database: {sport}-edges.db (same DB as curves)
Table name: sports_edges (filtered by sport column)
Standard columns:
| Column | Type | Description |
|---|---|---|
| sport | TEXT | Sport identifier for filtering |
| scan_type | TEXT | Collection window |
| game_date | TEXT | YYYY-MM-DD |
| game_time | TEXT | HH:MM ET |
| market_type | TEXT | spread, total, moneyline, etc. |
| sub_market | TEXT | f5, team_total, etc. (nullable) |
| game_id | TEXT | |
| away_team | TEXT | |
| home_team | TEXT | |
| side | TEXT | yes / no / home / away / over / under |
| threshold | REAL | The line |
| anchor_prob | REAL | Book curve fair probability |
| model_prob | REAL | Model curve fair probability |
| kalshi_price | REAL | What Kalshi is offering |
| execution_price | REAL | What you'd actually pay (including spread) |
| raw_edge_book | REAL | anchor_prob - kalshi_price |
| raw_edge_model | REAL | model_prob - kalshi_price |
| net_edge | REAL | Edge after fee (7% Kalshi fee) |
| confidence | TEXT | HIGH / MEDIUM / LOW |
| executable | BOOLEAN | Is the spread tight enough to actually trade? |
| detected_at | TEXT | ISO timestamp |
| actual_outcome | TEXT | win / loss / push (filled after settlement) |
| settled_at | TEXT | When settled |
| closing_price | REAL | Kalshi price at close (for CLV tracking) |
| clv | REAL | Closing line value — did the line move our way? |
Edge summary table: sports_edge_summary
Everything runs on Eastern Time (ET). The schedule has 3 layers:
| Minute | What fires | Cron pattern |
|---|---|---|
| :00 | Kalshi game prices + any other soft book pulls | 0 6,8,10,14,18 * * * |
| :02 | Anchor book pull (Pinnacle) | 2 6,8,10,14,18 * * * |
Game-day windows: 6 AM, 8 AM, 10 AM, 2 PM, 6 PM ET Closing snapshot: 1 minute before each game's start time (staggered per game, not all at once)
Three staggered groups so raw data finishes before computation starts:
| Group | Time | What runs |
|---|---|---|
| Group 1 — Raw Data | 9:00 AM | External API pulls: stats API, game logs, web scrapes, weather |
| Group 2 — Baselines | 9:05 AM | Light computation on Group 1 data: rolling averages, baselines, splits |
| Group 3 — Derived Metrics | 9:10 AM | Heavy math needing Groups 1+2: model composites, variance, fatigue, matchup adjustments |
| Minute | What fires | Cron pattern |
|---|---|---|
| :10 | Probability curves rebuild (reads fresh Kalshi + anchor data) | 10 6,8,10,14,18 * * * |
| :12 | Edge scanner (reads fresh curves, finds mispricings) | 12 6,8,10,14,18 * * * |
Critical: Edge scanner must run AFTER curves are built, which must run AFTER data collection. The 10-minute offset ensures fresh data is available. Never run the edge scanner at :00 — the data won't be there yet.
For long-term markets (futures, win totals, division winners):
| Phase | Frequency | When to transition |
|---|---|---|
| Phase 1 | Weekly (Monday) | Start of season → ~2 months before end |
| Phase 2 | Every 3 days | ~2 months before end → ~1 month before end |
| Phase 3 | Daily | ~1 month before end → season end |
| Expired | Stop scanning | After season ends |
Set sport-specific transition dates. Include an expiry_date column in futures databases — scanning stops after expiry.
Every desk needs an entry in src/pipeline/data-status/desk-config.ts:
{
name: '{Sport Name}',
slug: '{sport-slug}',
kalshi: [{
category: 'Sports / {Category} / {Sport}',
hasScraper: true,
scraperName: 'collector.ts (Kalshi sports cron)',
freshnessKey: 'kalshi-{sport}',
series: [
{ ticker: 'KXSERIES1', label: 'Human Label', dataViewKey: 'kalshi-{sport}-type1' },
// one per Kalshi series for this sport
],
}],
sources: [
// Sources organized by group — see Standard Groups below
],
}
Every game desk should have these groups on the dashboard. Order matters — this is how they appear on the page:
Every freshnessKey in desk-config needs a mapping in src/pipeline/data-status/source-tables.ts:
'freshnessKey': {
db: 'database-name', // SQLite DB file (without .db extension)
tables: ['table_name'], // Which table(s) to query
filter?: { // Optional: filter to specific rows
column: 'column_name',
value: 'filter_value',
},
parentKey?: 'parent-key', // Optional: inherit freshness from parent
}
Naming convention for freshnessKeys:
kalshi-{sport} — top-level Kalshi datakalshi-{sport}-{market} — filtered by market type (spreads, totals, etc.)pinnacle-{sport} — anchor book dataedge-scanner-{sport} — all edgesedge-scanner-{sport}-{market} — edges filtered by market type{sport}-{metric}-curves — probability curves{sport}-team-variance, {sport}-fatigue-metrics — computed analyticsscrape-{source}-{sport} — external data scrapesRegister cron jobs in src/cron/scheduler.ts. Pattern:
const task = cron.schedule('CRON_PATTERN', async () => {
try {
// 1. Import the scraper/calculator
// 2. Run it
// 3. Update freshness
const { updateFreshnessAutoHash: uf } = await import('../pipeline/data-freshness.js');
uf(getPipelineDb(), 'freshnessKey', 'fresh', rowCount);
} catch (e) {
logger.error(`Cron error: ${e instanceof Error ? e.message : e}`);
uf(getPipelineDb(), 'freshnessKey', 'error', undefined, e.message);
}
}, { timezone: 'America/New_York' });
this.tasks.set('task-name', task);
Freshness is automatic — the dashboard discovers sources from desk-config and checks data_freshness table. Each cron job calls updateFreshnessAutoHash() after running. No separate registration needed.
Thresholds:
| Source type | Yellow (stale) | Red (alert) |
|---|---|---|
| Game-day odds (6/8/10/2/6) | 30 min after window | 90 min after window |
| Daily stats (9 AM) | 60 min after expected | 180 min after expected |
| Futures (weekly/adaptive) | 2 days after expected | 7 days after expected |
| Edge scanner | 30 min after window | 90 min after window |
Telegram alert fires after 2 consecutive misses.
Before a desk is considered done, every item must pass:
| # | Item | What to check |
|---|---|---|
| 1 | Databases isolated | Every table in its own sport-specific .db file |
| 2 | Scraper queue independent | This desk's scrapers don't depend on other desks |
| 3 | Recovery queue | Scrapers are in the recovery queue (re-run on crash) |
| 4 | Freshness tracking | Every source has a freshnessKey and shows on dashboard |
| 5 | Dashboard views | Every freshnessKey resolves to a working data view |
| 6 | Edge scanner | Reads from isolated DBs, writes to sport-specific edges DB |
| 7 | Scan windows | Every row tagged with scan_type |
| 8 | Data cleanliness | No live, no settled, same-day only for game-day data |
| 9 | Column formatting | Human-readable tickers, dates, prices on dashboard views |
| 10 | Filters | Column filters on all dashboard views |
| 11 | No cross-desk dependencies | This desk doesn't break if another desk's scraper fails |
| 12 | Sport-specific edge cases | Doubleheaders, postponements, overtime, whatever applies |
| 13 | Alerts | Freshness alerts configured for this desk's sources |
Hard-won lessons from building the MLB desk. Read these before starting any new desk.
Player/team name mismatch across sources. Kalshi, Pinnacle, FanDuel, stats APIs — they all spell names differently. Kalshi uses tickers like "KXMLBHR-26APR07-A.BREGMAN-O1", FanDuel uses "Alex Bregman", stats APIs use "Bregman, Alex". You NEED a crosswalk table that maps names across sources. Build it early. Strip Jr./Sr. suffixes. Handle misspellings (Kalshi had "SUREZ" for Suarez).
Wrong scanner wired to cron. When the MLB prop edge scanner was first deployed, the cron job was calling the generic scanner instead of the MLB-specific one. The edges table was empty for days before anyone noticed. Always verify the cron is calling the right function — test it manually after wiring.
Filters too aggressive. The initial minimum price filter (5 cents) was dropping valid alt lines. Had to lower to 1 cent. Don't over-filter — you can always filter on the dashboard side. Be conservative with data-side filters.
Kalshi API pagination and rate limits. The initial Kalshi pull only fetched 1 page of results and missed most of the data. Some sports have 5+ pages of contracts. Always check if there are more pages. Add rate limit delays (500ms+) between page requests. Kalshi returns 429 (too many requests) if you hit their API too fast — when scanning multiple series in a loop, use at least 2-second delays between calls. Don't retry 429s immediately, wait and try on the next cron cycle.
Sign convention bleeding across market types. Spreads use negative signs (home -1.5), totals don't. A bug caused the total's model anchor to get a negative sign from the spread logic. Keep spread/total/moneyline code paths completely separate.
Live and settled data leaking in. Without explicit checks, the scraper will collect prices for games that have already started (live) or already ended (settled). This corrupts your curves. Always check game start time and skip games that have started. Filter bid <= 5 / bid >= 95 for settled contracts.
Edge scanner running before data lands. If the edge scanner fires at :00 (same time as data collection), it reads stale data from the previous window. The scanner MUST run after data collection finishes — at least :10 past the hour.
Staggered scheduling is critical. Raw data (Group 1, 9:00 AM) must finish before baselines (Group 2, 9:05 AM), which must finish before derived metrics (Group 3, 9:10 AM). If you don't stagger, derived metrics compute on yesterday's baselines.
An AI starting a new game desk MUST ask and get answers for these before writing any code:
This template is updated as new desks are built and new lessons are learned. The "Things to Watch Out For" section grows with every desk.
Last updated: 2026-04-08
Running log of changes and decisions for the game-level desk pipeline. Each entry has two sections:
If you're building this from scratch, read every Boss Notes section first — that's how the boss thinks about this system.
Changes:
No game desk bugs fixed this session. Changelog format established to match player props desk template.
Boss Notes:
/data-status/view/. If something looks off (all zeros, weird names, missing data), he flags it. The dashboard is his primary QA tool.Changes:
research-pipeline.db and mlb-edges.db recalculated. (edge-scanner.ts)pinnacle_anchor - model_anchor, positioned between Pin W/L and Model Anchor. (data-view.ts, table-config.ts)/data-status/v/:source renders only ~50 DOM rows regardless of dataset size. Server APIs: /data (paginated JSON, 44ms), /summary (SQL aggregates, 26ms), /filter-values (14ms). Scales to 100K+ rows. (virtual-view.ts, data-api.ts, routes.ts)data-api.ts, virtual-view.ts)virtual-view.ts)Boss Notes:
Changes:
data-view.ts)model_best_side column — Each model now has its own best side column. best_side renamed to "Pin Best Side", new model_best_side = "Model Best Side" based on model edges. Backfilled 3291+2419 rows. (edge-scanner.ts, table-config.ts, data-view.ts)final_margin + result (team covered? YES/NO). ML gets result (team won? YES/NO). Totals already had this. (data-view.ts)actual_outcome and settled_result (replaced by computed Result). Other views keep them. (data-view.ts)mlb_probability_curves. New is_tie column flags games that tied after 5 innings. Settlement detects TIE ticker settling YES and flags all team contracts for that game. (edge-scanner.ts, scanner-bridge.ts)Boss Notes:
best_side was only for Pinnacle, not the edge model. Needed two separate columns.Changes:
Added settlement and tracking columns to the probability curves standard schema (Table 3): is_main_line, actual_result, outcome, book_was_right, model_was_right, book_error, model_error, settled_at. These match the player props desk implementation. No game desk code changed — this is a template update so future game desks include these columns from the start.
Boss Notes:
is_main_line flag (not a value on every row), settlement runs as games finish (not morning batch), both book and model accuracy tracked.