Merged by: Claude Code (Sonnet 4.6) Date: 2026-03-15 Source files: opus.md (base), gemini-3.1-pro.md, gpt-4.1.md, grok-4-fast.md, grok-4.1.md, deepseek-v3.1.md, sonar-reasoning-pro.md Part III panel: opus.md (base), gemini-3.1-pro.md, gpt-4.1.md, grok-4-fast.md (prediction market signals) Rule: Opus is the complete base — nothing removed. All unique panel additions tagged [PANEL: Model Name].
Track the best and worst traders on prediction market platforms (Polymarket, Kalshi) to generate:
This is a cross-desk tool — prediction markets cover sports, crypto, politics, weather, and more. Every desk in the pipeline benefits from wallet intelligence.
On-chain wallets feed signals into two venue types:
Build Priority (all panelists agree):
The core strategy is NOT "copy the whale's trade on the same platform" — that fails because slippage eats the edge (the whale already moved the price). Instead:
Detect sharp money on Polymarket → trade the same event on Kalshi before Kalshi catches up.
Polymarket is where the whales live (bigger liquidity, on-chain, fast). Kalshi is a separate pool of traders who react slower. When a whale pushes a Polymarket contract from 42 to 48 cents, the same event on Kalshi might still sit at 43. You're trading the information flow lag between platforms.
This sidesteps the slippage problem because you're buying on the platform that hasn't moved yet.
[PANEL: Sonar Reasoning Pro] — The cross-platform arbitrage thesis assumes significant overlap between Polymarket and Kalshi event catalogs. Empirical reality is ~30-40% overlap. The first task (Phase 0) must measure actual overlap before committing to cross-platform infrastructure. If overlap is <30%, the strategy must pivot to pure wallet intelligence (same-platform copy-trading with better entry timing).
Before building anything beyond the divergence detector, run exactly 4 weeks of data collection and evaluate against these thresholds:
PROCEED with full build if:
PIVOT to same-platform wallet intelligence only if: divergences exist but are <3/day or <90 seconds. The cross-platform arb is too thin, but wallet scoring still has value for same-platform copy-trading with better entry timing.
KILL cross-platform component if: <1 divergence >5 cents per day. The platforms are too tightly coupled.
Validation data stored in:
CREATE TABLE phase0_validation (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL,
divergences_detected INTEGER,
divergences_above_5c INTEGER,
divergences_above_5c_after_fees INTEGER,
median_gap_duration_seconds REAL,
median_lagging_liquidity_usd REAL,
max_gap_cents REAL,
notes TEXT
);
Populate daily. After 28 rows, run the decision. This costs almost nothing and prevents building an elaborate system on a false premise.
All Polymarket trades happen on Polygon blockchain. Every wallet's full history is publicly readable:
Data Sources (all free, no keys):
clob.polymarket.com) — market data, order books, trades. Already integrated.wss://ws-subscriptions-clob.polymarket.com/ws/) — real-time trade events, order book updates, price changes. Critical for sub-5-second latency on follow-list alerts.gamma-api.polymarket.com) — historical activity, leaderboard, profiles, batch data. Public, no auth.Kalshi does NOT expose individual account data. Approach:
/markets/{ticker}/orderbook gives live order book. Depth changes over time are better signals than price alone. A 3-cent move through $50K of resting orders is real; a 10-cent move on thin liquidity is noise./markets/{ticker}/trades gives recent trades (price, count, timestamp) without identifying trader. Build trade flow imbalance metrics./series/{ticker} gives all markets in a series — track flow across related markets (e.g., all Fed rate contracts).[PANEL: DeepSeek V3.1] — Kalshi trader concentration metrics: Kalshi's lack of wallet data makes position limits critical. Derive estimated % of limit consumed from observed trade sizes vs published limits.
[PANEL: GPT-4.1] — Competitor platform expansion (nice-to-have, v2): PredictIt API (free/public), Betfair Exchange API (free read-only) for politics and sports overlap. Expands cross-platform arb surface area.
Auth flow (required — without this, all Kalshi API calls return 401):
https://trading-api.kalshi.com/trade-api/v2/login with {email, password} → receive {token, member_id}Rate limits: Kalshi allows ~10 requests/second for authenticated users. Token bucket rate limiter: bucket size 10, refill rate 10/sec. On 429 response: exponential backoff starting at 1s, max 30s, 5 retries.
[PANEL: DeepSeek V3.1] — Kalshi trade tape has 15-30s delay during volatility. Assume 45s worst-case latency, not 5s. Design alert urgency accordingly.
Request priority queue: Divergence checks (high) > flow signal polling (medium) > market discovery (low). When approaching rate limit, shed low-priority requests first.
Kalshi WebSocket: wss://trading-api.kalshi.com/trade-api/ws/v2 available for orderbook and trade streaming. Use for HOT markets (see tiered polling below) instead of REST, preserving REST budget for discovery and flow analysis.
Rate tracking table:
CREATE TABLE api_rate_tracking (
id INTEGER PRIMARY KEY AUTOINCREMENT,
platform TEXT NOT NULL,
endpoint TEXT NOT NULL,
request_count INTEGER DEFAULT 0,
error_count INTEGER DEFAULT 0,
last_429_at TEXT,
period_start TEXT NOT NULL,
period_minutes INTEGER DEFAULT 1
);
Primary: Alchemy free tier — 300M compute units/month, ~300K eth_getLogs calls. Endpoint: https://polygon-mainnet.g.alchemy.com/v2/{API_KEY}. Sign up at alchemy.com (free).
Fallback: polygon-bor-rpc.publicnode.com for low-priority calls (gas price estimation, basic block queries).
Usage allocation:
Minimize RPC dependency overall: Use CLOB API and WebSocket as primary trade data sources. Use RPC only for:
CREATE TABLE data_source_status (
source TEXT PRIMARY KEY,
status TEXT, -- 'live', 'degraded', 'down'
last_message_at TEXT,
reconnect_count INTEGER DEFAULT 0,
current_backoff_seconds REAL DEFAULT 0
);
-- ═══════════════════════════════════════════════════════════════
-- CANONICAL MARKET REFERENCE
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE markets (
market_id TEXT PRIMARY KEY,
platform TEXT NOT NULL, -- 'polymarket' or 'kalshi'
condition_id TEXT, -- Polymarket condition_id
token_id TEXT, -- Polymarket outcome token
kalshi_ticker TEXT, -- Kalshi ticker
title TEXT NOT NULL,
category TEXT,
created_at TEXT,
end_date TEXT,
settled INTEGER DEFAULT 0,
outcome TEXT, -- 'yes', 'no', null if unsettled
settlement_source TEXT, -- e.g., 'BLS', 'NOAA', 'AP' (Kalshi)
status TEXT DEFAULT 'active', -- 'active', 'halted', 'suspended', 'cancelled' [PANEL: GPT-4.1]
resolution_disputed INTEGER DEFAULT 0, -- flag for oracle disputes [PANEL: Opus review]
fetched_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_markets_platform ON markets(platform);
CREATE INDEX idx_markets_category ON markets(category);
CREATE INDEX idx_markets_unsettled ON markets(settled) WHERE settled = 0;
CREATE INDEX idx_markets_status ON markets(status) WHERE status != 'active';
-- ═══════════════════════════════════════════════════════════════
-- CROSS-PLATFORM EVENT MATCHING
-- ═══════════════════════════════════════════════════════════════
-- THE LOAD-BEARING TABLE. All cross-platform features depend on this.
-- No entry here = no divergence signal for that market pair.
CREATE TABLE event_pairs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
poly_market_id TEXT REFERENCES markets(market_id),
kalshi_market_id TEXT REFERENCES markets(market_id),
event_slug TEXT NOT NULL,
match_method TEXT NOT NULL, -- 'manual', 'fuzzy_title', 'embedding', 'series_rule'
match_confidence REAL NOT NULL, -- 0-1
verified INTEGER DEFAULT 0, -- 1 = human-confirmed
title_similarity REAL,
date_overlap_days REAL,
category_match INTEGER,
resolution_rules_match INTEGER DEFAULT 0, -- 1 if oracle/rules verified same [PANEL: Gemini 3.1 Pro]
polling_tier TEXT DEFAULT 'cold', -- 'hot', 'warm', 'cold' (drives polling frequency)
created_at TEXT NOT NULL DEFAULT (datetime('now')),
verified_at TEXT,
verified_by TEXT,
UNIQUE(poly_market_id, kalshi_market_id)
);
CREATE INDEX idx_ep_verified ON event_pairs(verified, match_confidence);
CREATE INDEX idx_ep_tier ON event_pairs(polling_tier);
-- ═══════════════════════════════════════════════════════════════
-- POLYMARKET PROXY WALLET MAPPING
-- ═══════════════════════════════════════════════════════════════
-- Critical for deduplication. A user's CLOB address (proxy) may differ
-- from their on-chain EOA. Without this, same trader appears twice.
CREATE TABLE wallet_address_map (
id INTEGER PRIMARY KEY AUTOINCREMENT,
eoa_address TEXT NOT NULL,
proxy_address TEXT NOT NULL,
platform TEXT NOT NULL DEFAULT 'polymarket',
discovered_via TEXT, -- 'proxy_factory_event', 'gamma_api', 'manual'
discovered_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(eoa_address, proxy_address)
);
CREATE INDEX idx_wam_proxy ON wallet_address_map(proxy_address);
CREATE INDEX idx_wam_eoa ON wallet_address_map(eoa_address);
-- ═══════════════════════════════════════════════════════════════
-- POLYMARKET WALLET PROFILES
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_profiles (
id INTEGER PRIMARY KEY AUTOINCREMENT,
address TEXT NOT NULL UNIQUE, -- canonical proxy_address
proxy_address TEXT, -- explicit proxy (same as address for Polymarket)
eoa_address TEXT, -- underlying EOA if known
label TEXT, -- 'sharp', 'dumb', 'neutral', 'bot', 'mm', 'whale'
entity_name TEXT, -- from Arkham/manual: 'Wintermute', 'Jump Trading', etc.
likely_hedge INTEGER DEFAULT 0, -- 1 if likely hedging portfolio, not directional [PANEL: Sonar Reasoning Pro]
first_seen TEXT NOT NULL,
last_active TEXT NOT NULL,
total_trades INTEGER DEFAULT 0,
total_settled INTEGER DEFAULT 0,
total_volume_usd REAL DEFAULT 0,
win_count INTEGER DEFAULT 0,
loss_count INTEGER DEFAULT 0,
win_rate REAL DEFAULT 0,
total_pnl_usd REAL DEFAULT 0,
roi REAL DEFAULT 0,
max_drawdown_usd REAL DEFAULT 0,
avg_entry_timing REAL, -- avg % of time-to-settlement when entering (0=early, 1=late)
brier_score REAL, -- calibration accuracy (lower = better)
brier_score_30d REAL, -- rolling 30-trade Brier for regime detection
brier_n INTEGER DEFAULT 0, -- number of markets contributing to Brier
news_adjusted_timing_score REAL, -- timing score discounting news-proximate trades
composite_score REAL DEFAULT 0, -- 0-100 ranking
confidence_tier TEXT DEFAULT 'low', -- 'low' (<50 trades), 'medium' (50-199), 'high' (200+)
-- Market maker specific
avg_spread_offered REAL,
liquidity_side_bias REAL,
-- Per-oracle specialization [PANEL: Sonar Reasoning Pro]
oracle_specialization TEXT, -- JSON: {oracle: brier_score} e.g. {"BLS": 0.12, "AP": 0.21}
-- Metadata
funding_source TEXT,
social_link TEXT,
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- ═══════════════════════════════════════════════════════════════
-- WALLET CATEGORY STATS
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_category_stats (
address TEXT NOT NULL,
category TEXT NOT NULL,
trades INTEGER DEFAULT 0,
settled INTEGER DEFAULT 0,
win_rate REAL,
brier_score REAL,
roi REAL,
pnl_usd REAL DEFAULT 0,
avg_entry_timing REAL,
PRIMARY KEY (address, category)
);
CREATE INDEX idx_wcs_category ON wallet_category_stats(category, roi DESC);
-- ═══════════════════════════════════════════════════════════════
-- INDIVIDUAL TRADES
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_trades (
id INTEGER PRIMARY KEY AUTOINCREMENT,
address TEXT NOT NULL,
market_id TEXT NOT NULL REFERENCES markets(market_id),
position_id TEXT, -- groups related buys/sells (lifecycle managed by code)
side TEXT NOT NULL, -- 'yes' or 'no'
action TEXT NOT NULL, -- 'buy' or 'sell'
price REAL NOT NULL, -- 0-1
size REAL NOT NULL, -- number of shares
cost_usd REAL,
tx_hash TEXT,
block_number INTEGER,
traded_at TEXT NOT NULL,
settled INTEGER DEFAULT 0, -- 0=open, 1=settled
outcome TEXT, -- 'win', 'loss', null if unsettled
pnl_usd REAL,
market_settled_at TEXT,
days_before_settlement REAL,
timing_bucket TEXT, -- 'early' (<20%), 'mid' (20-70%), 'late' (>70%)
vwap_at_settlement REAL, -- VWAP of shares still held at settlement, for Brier calc
-- News proximity enrichment
news_proximate INTEGER DEFAULT 0, -- 1 if relevant headline within 30min before trade
news_headline TEXT,
news_source TEXT,
news_detected_at TEXT,
fetched_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(address, tx_hash) -- [PANEL: GPT-4.1] prevent duplication from multiple sources
);
CREATE INDEX idx_wt_addr_time ON wallet_trades(address, traded_at DESC);
CREATE INDEX idx_wt_market ON wallet_trades(market_id, address);
CREATE INDEX idx_wt_unsettled ON wallet_trades(settled) WHERE settled = 0;
CREATE INDEX idx_wt_outcome ON wallet_trades(address, outcome) WHERE settled = 1;
CREATE INDEX idx_wt_position ON wallet_trades(position_id) WHERE position_id IS NOT NULL;
-- ═══════════════════════════════════════════════════════════════
-- NET POSITIONS
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_positions (
address TEXT NOT NULL,
market_id TEXT NOT NULL REFERENCES markets(market_id),
side TEXT NOT NULL,
net_shares REAL NOT NULL,
avg_entry_price REAL NOT NULL,
cost_basis_usd REAL NOT NULL,
current_value_usd REAL,
unrealized_pnl REAL,
size_vs_typical REAL, -- this position size / wallet avg (conviction signal)
position_state TEXT DEFAULT 'open', -- 'open', 'closed', 'settled'
opened_at TEXT,
closed_at TEXT,
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (address, market_id)
);
CREATE INDEX idx_wp_addr ON wallet_positions(address);
CREATE INDEX idx_wp_open ON wallet_positions(position_state) WHERE position_state = 'open';
-- ═══════════════════════════════════════════════════════════════
-- FOLLOW / FADE LISTS
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_lists (
id INTEGER PRIMARY KEY AUTOINCREMENT,
address TEXT NOT NULL,
list_type TEXT NOT NULL, -- 'follow' or 'fade'
fade_action TEXT, -- null for follow; 'avoid' or 'invert' for fade
rank INTEGER,
score REAL NOT NULL,
category_score_used TEXT,
reason TEXT,
added_at TEXT NOT NULL DEFAULT (datetime('now')),
expires_at TEXT,
active INTEGER DEFAULT 1,
user_id TEXT
);
CREATE INDEX idx_wl_active ON wallet_lists(active, list_type);
-- ═══════════════════════════════════════════════════════════════
-- ALERTS
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_alerts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
address TEXT NOT NULL,
wallet_label TEXT,
list_type TEXT NOT NULL, -- 'follow' or 'fade'
market_id TEXT NOT NULL REFERENCES markets(market_id),
category TEXT,
side TEXT NOT NULL,
price REAL NOT NULL,
size REAL NOT NULL,
signal TEXT NOT NULL, -- 'copy', 'fade', 'reduce', 'exit' [PANEL: Opus review]
alert_strength REAL,
-- Execution awareness
estimated_slippage REAL,
edge_after_slippage REAL,
available_liquidity REAL,
-- Fee-aware net edge [PANEL: Opus aplus]
kalshi_fee_estimate REAL,
poly_fee_estimate REAL,
net_edge_after_fees REAL,
-- Outcome tracking
acted_on INTEGER DEFAULT 0,
market_settled INTEGER DEFAULT 0,
outcome TEXT,
alerted_at TEXT NOT NULL DEFAULT (datetime('now')),
user_id TEXT
);
CREATE INDEX idx_wa_time ON wallet_alerts(alerted_at DESC);
CREATE INDEX idx_wa_addr ON wallet_alerts(address);
-- ═══════════════════════════════════════════════════════════════
-- KALSHI FLOW SIGNALS
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE kalshi_flow_signals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
market_id TEXT REFERENCES markets(market_id),
ticker TEXT NOT NULL,
signal_type TEXT NOT NULL,
-- 'volume_spike', 'price_divergence', 'spread_collapse',
-- 'momentum_shift', 'open_gap', 'correlated_series',
-- 'iceberg', 'event_window', 'size_clustering', 'book_imbalance'
direction TEXT, -- 'yes' or 'no'
magnitude REAL,
baseline_used TEXT, -- 'market' or 'category' (cold-start fallback)
context TEXT, -- JSON with supporting data
resolved INTEGER DEFAULT 0,
outcome TEXT,
detected_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_kfs_time ON kalshi_flow_signals(detected_at DESC);
CREATE INDEX idx_kfs_ticker ON kalshi_flow_signals(ticker, detected_at DESC);
-- ═══════════════════════════════════════════════════════════════
-- KALSHI MARKET BASELINES (cold-start protocol for flow signals)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE kalshi_market_baselines (
ticker TEXT NOT NULL,
hour_of_day INTEGER, -- 0-23 (time-of-day normalization)
day_of_week INTEGER, -- 0-6 (volume differs Mon vs Sat)
avg_hourly_volume REAL,
stddev_hourly_volume REAL,
avg_spread_cents REAL,
avg_book_depth_usd REAL,
sample_days INTEGER DEFAULT 0,
baseline_reliable INTEGER DEFAULT 0, -- 1 when sample_days >= 7
updated_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (ticker, hour_of_day, day_of_week)
);
-- ═══════════════════════════════════════════════════════════════
-- CROSS-PLATFORM DIVERGENCE
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE platform_divergence (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_pair_id INTEGER REFERENCES event_pairs(id),
market_id_poly TEXT REFERENCES markets(market_id),
market_id_kalshi TEXT REFERENCES markets(market_id),
event_description TEXT,
kalshi_yes_price REAL,
poly_yes_price REAL,
divergence_cents REAL, -- SIGNED: poly - kalshi
abs_divergence REAL,
leader TEXT, -- 'kalshi' or 'polymarket'
triggered_by TEXT, -- 'whale_move', 'organic', 'news', 'unknown'
-- Execution info
kalshi_liquidity REAL,
poly_liquidity REAL,
-- Fee-aware net edge (kills false positives from fee blindness)
kalshi_fee_estimate REAL,
poly_fee_estimate REAL,
estimated_slippage REAL,
net_edge_after_fees REAL, -- divergence_cents - kalshi_fee - poly_fee - slippage
-- Outcome
detected_at TEXT NOT NULL DEFAULT (datetime('now')),
converged_at TEXT,
convergence_time_seconds REAL, -- seconds (not minutes — arb windows often <90s)
resolved INTEGER DEFAULT 0,
winner TEXT -- which platform's price was closer to outcome
);
CREATE INDEX idx_pd_unresolved ON platform_divergence(resolved, detected_at) WHERE resolved = 0;
CREATE INDEX idx_pd_time ON platform_divergence(detected_at DESC);
-- ═══════════════════════════════════════════════════════════════
-- FEE SCHEDULE (lookup table for net edge calculation)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE fee_schedule (
platform TEXT NOT NULL,
market_category TEXT,
fee_type TEXT NOT NULL, -- 'taker', 'maker', 'settlement', 'gas'
fee_cents REAL NOT NULL,
fee_percent REAL,
effective_date TEXT NOT NULL,
PRIMARY KEY (platform, market_category, fee_type, effective_date)
);
-- Seed data (update as platforms change):
-- Kalshi taker fee: ~7 cents/contract (varies by market, range 5-10c)
-- Kalshi maker fee: 0 cents
-- Kalshi settlement fee: 0 cents
-- Polymarket CLOB fee: varies by market (check via API)
-- Polymarket gas: ~1-3 cents per tx on Polygon (estimate from RPC eth_gasPrice)
-- ═══════════════════════════════════════════════════════════════
-- PHASE 0 VALIDATION
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE phase0_validation (
id INTEGER PRIMARY KEY AUTOINCREMENT,
date TEXT NOT NULL,
divergences_detected INTEGER,
divergences_above_5c INTEGER,
divergences_above_5c_after_fees INTEGER,
median_gap_duration_seconds REAL,
median_lagging_liquidity_usd REAL,
max_gap_cents REAL,
notes TEXT
);
-- ═══════════════════════════════════════════════════════════════
-- WALLET CONSENSUS (herding detection)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE wallet_consensus (
id INTEGER PRIMARY KEY AUTOINCREMENT,
market_id TEXT NOT NULL REFERENCES markets(market_id),
category TEXT,
follow_wallets_in INTEGER NOT NULL,
fade_wallets_in INTEGER NOT NULL,
dominant_side TEXT,
signal_quality TEXT, -- 'strong', 'crowded', 'contrarian'
first_entry_at TEXT,
last_entry_at TEXT,
detected_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_wcon_market ON wallet_consensus(market_id);
-- ═══════════════════════════════════════════════════════════════
-- WALLET SIMILARITY (behavioral fingerprinting) — DEFERRED TO v2
-- ═══════════════════════════════════════════════════════════════
-- Table defined now for data integrity. Computation pipeline deferred.
-- Simple heuristic at launch: same funding wallet (Polygon RPC) = likely same entity.
CREATE TABLE wallet_similarity (
address_a TEXT NOT NULL,
address_b TEXT NOT NULL,
similarity_score REAL NOT NULL,
feature_vector TEXT, -- JSON: category_dist, avg_timing, avg_size, frequency
same_funder INTEGER DEFAULT 0, -- quick heuristic: shared Polygon funding source [PANEL: Sonar Reasoning Pro]
likely_same_entity INTEGER DEFAULT 0,
computed_at TEXT NOT NULL DEFAULT (datetime('now')),
PRIMARY KEY (address_a, address_b)
);
-- ═══════════════════════════════════════════════════════════════
-- SYSTEM META-SCORING (track our own accuracy)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE system_predictions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
signal_source TEXT NOT NULL, -- 'follow_copy', 'fade_invert', 'divergence_arb', 'consensus', 'flow_signal'
market_id TEXT REFERENCES markets(market_id),
predicted_side TEXT NOT NULL,
confidence REAL NOT NULL,
entry_price_simulated REAL,
config_id INTEGER REFERENCES shadow_configurations(config_id), -- which weight config
settled INTEGER DEFAULT 0,
outcome TEXT, -- 'correct', 'incorrect'
pnl_simulated REAL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
settled_at TEXT
);
CREATE INDEX idx_sp_source ON system_predictions(signal_source, settled);
CREATE INDEX idx_sp_unsettled ON system_predictions(settled) WHERE settled = 0;
-- ═══════════════════════════════════════════════════════════════
-- SHADOW MODE WEIGHT CONFIGURATIONS
-- ═══════════════════════════════════════════════════════════════
-- Run 4 configs in parallel during shadow mode.
-- After 200+ predictions per config, pick winner empirically.
CREATE TABLE shadow_configurations (
config_id INTEGER PRIMARY KEY,
config_name TEXT NOT NULL,
weight_roi REAL,
weight_brier REAL,
weight_timing REAL,
weight_recency REAL,
weight_winrate REAL,
active INTEGER DEFAULT 1
);
-- Seed 4 configs:
-- 1: Current consensus: ROI 30%, Brier 25%, Timing 20%, Recency 15%, Win Rate 10%
-- 2: ROI-heavy: ROI 45%, Brier 20%, Timing 15%, Recency 15%, Win Rate 5%
-- 3: Brier-heavy: ROI 20%, Brier 40%, Timing 20%, Recency 15%, Win Rate 5%
-- 4: Timing-heavy: ROI 20%, Brier 20%, Timing 35%, Recency 15%, Win Rate 10%
-- ═══════════════════════════════════════════════════════════════
-- NEWS PROXIMITY (enrichment for wallet trades)
-- ═══════════════════════════════════════════════════════════════
-- Tracks news headlines fetched per market for proximity tagging.
CREATE TABLE market_news (
id INTEGER PRIMARY KEY AUTOINCREMENT,
market_id TEXT REFERENCES markets(market_id),
headline TEXT NOT NULL,
source TEXT,
published_at TEXT NOT NULL,
fetched_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_mn_market_time ON market_news(market_id, published_at DESC);
-- ═══════════════════════════════════════════════════════════════
-- MACRO EVENT CALENDAR [PANEL: Sonar Reasoning Pro]
-- ═══════════════════════════════════════════════════════════════
-- Smart money acts pre-event. Without this, can't distinguish
-- early-alpha entries from "just read the calendar" entries.
CREATE TABLE macro_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_name TEXT NOT NULL, -- 'FOMC Rate Decision', 'CPI Release', 'NFL Kickoff'
event_type TEXT NOT NULL, -- 'economic', 'sports', 'political', 'crypto', 'other'
scheduled_at TEXT NOT NULL,
actual_at TEXT, -- filled in after event fires
source TEXT, -- 'FRED', 'investing.com', 'ESPN API'
related_market_ids TEXT, -- JSON array of market_ids this event touches
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_me_time ON macro_events(scheduled_at);
-- ═══════════════════════════════════════════════════════════════
-- MARKET CREATOR TRACKING [PANEL: Sonar Reasoning Pro]
-- ═══════════════════════════════════════════════════════════════
-- Smart money creates or seeds markets they have information about.
-- Market creators often have edge. Retail never sees this pattern.
CREATE TABLE market_creators (
market_id TEXT NOT NULL REFERENCES markets(market_id),
creator_address TEXT NOT NULL,
creator_entity TEXT, -- from Arkham if known
initial_seed_usd REAL,
platform TEXT NOT NULL,
created_at TEXT NOT NULL,
PRIMARY KEY (market_id, creator_address)
);
-- ═══════════════════════════════════════════════════════════════
-- ORDERBOOK SNAPSHOTS (for slippage validation and arb decay analysis)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE orderbook_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
market_id TEXT NOT NULL REFERENCES markets(market_id),
platform TEXT NOT NULL,
bid1_price REAL, bid1_size REAL,
bid2_price REAL, bid2_size REAL,
bid3_price REAL, bid3_size REAL,
ask1_price REAL, ask1_size REAL,
ask2_price REAL, ask2_size REAL,
ask3_price REAL, ask3_size REAL,
snapped_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE INDEX idx_obs_market_time ON orderbook_snapshots(market_id, snapped_at DESC);
-- ═══════════════════════════════════════════════════════════════
-- SURVIVORSHIP BIAS TRACKER [PANEL: DeepSeek V3.1]
-- ═══════════════════════════════════════════════════════════════
-- Track ALL wallets that ever crossed 60%+ win rate.
-- Monitor how many regressed. Report survival rate as system calibration.
CREATE TABLE wallet_peak_tracking (
address TEXT PRIMARY KEY,
peak_win_rate REAL,
peak_at TEXT,
peak_trade_count INTEGER,
current_win_rate REAL,
regressed INTEGER DEFAULT 0, -- 1 if win_rate dropped below 50% after peak >60%
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
-- ═══════════════════════════════════════════════════════════════
-- SQLITE WRITE BUFFER TRACKING
-- ═══════════════════════════════════════════════════════════════
-- Tracks failed writes from the in-memory buffer for recovery.
CREATE TABLE failed_writes_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
table_name TEXT NOT NULL,
payload TEXT NOT NULL, -- JSON of the failed row
error TEXT,
failed_at TEXT NOT NULL DEFAULT (datetime('now')),
recovered INTEGER DEFAULT 0
);
The event_pairs table is the load-bearing wall of all cross-platform features. Without it, the divergence detector compares random unrelated markets.
Matching pipeline:
FED-{date}-T{rate} maps to Polymarket titles containing "Federal Reserve" + same dateresolution_rules_match field.All divergence detection queries JOIN through event_pairs — no verified entry, no divergence signal.
On startup and daily: query Polygon for ProxyCreated events from Polymarket's proxy factory contract (address: 0xaB45c5A4B0c941a2F231C04C3f49182e1A254052 on Polygon). Each event contains the EOA and proxy address.
Convention: use proxy_address as canonical (what the CLOB API uses). Before creating a new wallet_profile, check wallet_address_map to see if this address (EOA or proxy) is already tracked. Without this, the same trader appears twice — once from CLOB data and once from on-chain scanning — splitting their stats and corrupting scoring.
Wallet Ranking Composite Score (0-100):
[PANEL: Sonar Reasoning Pro] — Entry timing weight should be 35-40%, not 20%. A wallet that buys at 20c and gets paid at 95c (early + right) should rank far higher than one buying at 80c getting paid at 100c (late + right). Consider increasing timing weight to 35% and reducing Brier to 15% based on shadow mode results.
Weight configurations tested in parallel during shadow mode (see shadow_configurations table). Weights finalized empirically after 200+ settled predictions per config.
Confidence Tiers:
Minimum thresholds:
List thresholds:
Brier is 25% of composite. Must be calculated correctly:
brier = (predicted_probability - outcome)^2 where outcome = 1.0 (YES wins) or 0.0 (NO wins)weighted_brier = brier × log2(1 + cost_basis_usd / 100). A $10,000 position contributes ~6.6x more than a $100 position.position_id assignment rules (required for P&L, Brier, exit detection):
position_id = hash(address + market_id + side + open_timestamp) where open_timestamp = first trade creating non-zero positionrealized_pnl = sum(sell_proceeds) - sum(buy_costs) (closed). settlement_pnl = (outcome × net_shares) - cost_basis (settled)[PANEL: Sonar Reasoning Pro] — Wallets that enter a market on day 1 (before retail knows it exists) have different edge than those entering on day 30. Track: (first_entry_at - market_created_at) / (market_end_date - market_created_at). Entries in the <20% bucket should be weighted 3x in timing score. Already derivable from existing data — just needs explicit computation.
Formal component: WebSocketManager
Polymarket connections: wss://ws-subscriptions-clob.polymarket.com/ws/market/{market_id}. Subscribe to trades and book channels.
Protocol:
GET /trades?market={id}&after={last_timestamp} immediately to backfill missed tradesREST-first principle: Build REST polling fallback FIRST. Add WebSocket as enhancement. If WebSocket is down, system degrades gracefully, not breaks.
All WebSocket-sourced writes go through an in-memory queue:
BEGIN/COMMIT transaction wrapping all queued writes as batch INSERTSQLITE_BUSY: retry after 100ms, up to 5 timesfailed_writes.jsonl (and failed_writes_log table) for manual recoverylocks table). Write buffer pauses flushing. Queued writes accumulate in memory, flush after batch completes (~60s max window)HOT markets (daily volume >$50K on both platforms, OR follow-list wallet has open position):
WARM markets (daily volume $5K-$50K):
COLD markets (daily volume <$5K):
Tier assignment: recalculate daily from trailing 7-day average volume. Promote to HOT immediately when follow-list wallet enters a position. Rate limit budget: HOT polling ≈ 2 req/sec (20% of 10/sec Kalshi limit), WARM ≈ 0.8 req/sec (8%), leaving 72% for flow analysis and discovery.
Kill threshold: net_edge_after_fees > 2 cents minimum. This kills false positives from the Kalshi 7-cent taker fee eating apparent divergences under ~12 cents.
kalshi_market_baselines)macro_events table)Cold-start protocol: For the first 7 days after a market is added, collect hourly snapshots but do NOT generate flow signals (baseline_reliable=0). After 7 days, compute baselines per hour-of-day and day-of-week (baseline_reliable=1). Use category-level baselines as fallback for new markets. Baseline decay: recompute weekly using 30-day rolling window. Markets inactive >14 days reset to baseline_reliable=0.
For each active market, extract 2-3 keywords from title. Every 30 minutes, check Google News RSS for those keywords (free, unlimited, no API key required). When a follow-list wallet trades, look back 30 minutes for matching headlines. Tag trade as news_proximate=1 if found.
Scoring impact: Trades where news_proximate=1 get 0.5x weight for the Timing component. Add news_adjusted_timing_score to wallet_profiles that discounts news-proximate trades. A wallet consistently early BEFORE public news has exponentially more value.
[PANEL: DeepSeek V3.1 / GPT-4.1 / Grok 4.1 / Grok 4 Fast] — Social sentiment (X/Twitter, Reddit) around market events often precedes price moves, especially in politics and crypto. Free options: Twitter API v2 basic tier for low-volume polling, Google News RSS. Tag whale trades with social_proximate indicator. Classify as:
news_proximate: Reuters/AP/Bloomberg headline within 30 minutessocial_proximate: X trending hashtag or influencer post within 30 minutesretail_concentration: >80% of market volume in <$500 trades on one side (fade signal)When multiple follow-list wallets enter the same market:
Before firing any copy/fade alert:
fee_schedule table)Fees: Kalshi 7c taker + Poly ~2c gas = 9c | Net edge: {net}cMarket makers profit from spread, not prediction. Track separately:
UNIQUE(address, tx_hash) prevents duplication from multiple data sources hitting the same trade.Track wallet accuracy per category:
Before any live trades:
Every alert generates a prediction: "this signal will be profitable." After settlement:
PRAGMA journal_mode=WAL for concurrent reads during writeswallet_trades_archive. Keep main table lean.markets table is canonical reference. No redundant title/category storage.# Market data
GET https://clob.polymarket.com/markets — list active markets
GET https://clob.polymarket.com/markets/{id} — single market
GET https://clob.polymarket.com/books?token_id={id} — order book
GET https://clob.polymarket.com/trades?market={id}&after={ts} — trade tape (with timestamp backfill)
# Wallet data (Gamma API)
GET https://gamma-api.polymarket.com/profiles/{address} — wallet profile
GET https://gamma-api.polymarket.com/leaderboard — top wallets
GET https://gamma-api.polymarket.com/positions/{address} — current positions
GET https://gamma-api.polymarket.com/activity/{address} — trade history
# WebSocket
wss://ws-subscriptions-clob.polymarket.com/ws/market/{asset_ids_csv}
Channels: trades, book
# Auth (required)
POST https://trading-api.kalshi.com/trade-api/v2/login
Body: {email, password} → {token, member_id}
# Markets
GET https://trading-api.kalshi.com/trade-api/v2/markets — all markets
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker} — single market
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker}/orderbook — live order book
GET https://trading-api.kalshi.com/trade-api/v2/markets/{ticker}/trades — trade tape
GET https://trading-api.kalshi.com/trade-api/v2/series/{ticker} — series markets
# WebSocket
wss://trading-api.kalshi.com/trade-api/ws/v2
Auth: Bearer token in Upgrade headers
Use for HOT markets only (preserves REST rate limit budget)
# Primary: Alchemy free tier
https://polygon-mainnet.g.alchemy.com/v2/{API_KEY}
eth_getLogs — CTFE contract events (wallet trades, settlements)
eth_getLogs — proxy factory events (ProxyCreated)
eth_gasPrice — gas estimation for Poly fees
# Polygon contracts
CTFE (standard trades): 0x4bFb41d5B3570DeFd03C39a9A4D8dE6Bd8B8982E
Neg Risk Adapter (multi-outcome markets): 0xd91E80cF2E7be2e162c6513ceD06f1dD0dA35296
Proxy Factory: 0xaB45c5A4B0c941a2F231C04C3f49182e1A254052
# News proximity (free)
Google News RSS: https://news.google.com/rss/search?q={keywords}&hl=en
No API key. No rate limit. Extract title + published date.
# Macro event calendar (free)
FRED: https://api.stlouisfed.org/fred/releases?api_key={free_key}
Investing.com: scrape economic calendar page
# Wallet labels (free tier)
Arkham Intelligence: https://api.arkhamintelligence.com — ~100 queries/day free
Use only for new wallets, cache aggressively. Do NOT poll daily.
Dune Analytics: https://dune.com/queries — community Polymarket dashboards
| Component | Frequency | Method | Rate budget |
|---|---|---|---|
| Polymarket HOT markets | Real-time | WebSocket | ~10 connections for 200 markets |
| Kalshi HOT markets | 15s | Kalshi WebSocket | ~2 req/sec |
| Polymarket WARM markets | 2min REST | CLOB API | ~0.8 req/sec |
| Kalshi WARM markets | 2min REST | REST API | ~0.8 req/sec |
| COLD markets | 15min REST | REST APIs | ~0.1 req/sec |
| Divergence detection | Tier-driven | Derived from above | n/a |
| Kalshi flow signals | 15min | REST API | included above |
| Daily wallet scan | 6AM ET | Gamma API + CLOB | 1 batch |
| Settlement scan | Hourly + event | Polygon WS + REST | low |
| News proximity | 30min | Google News RSS | unlimited |
| Orderbook snapshots | 5min (active) | CLOB + Kalshi | ~0.07 req/sec |
| Proxy wallet mapping | Daily + startup | Polygon RPC | Alchemy |
| Baseline updates | Weekly | Computed | n/a |
| Wallet similarity | Weekly | Computed | n/a |
| Macro event calendar | Daily | FRED / scrape | low |
| Table | Est. rows/year | Avg row size | GB/year |
|---|---|---|---|
| wallet_trades | 900K | 200 bytes | ~0.18 GB |
| markets | 50K | 300 bytes | ~0.015 GB |
| platform_divergence | 50K | 200 bytes | ~0.01 GB |
| kalshi_flow_signals | 100K | 200 bytes | ~0.02 GB |
| orderbook_snapshots | 3M | 100 bytes | ~0.30 GB |
| system_predictions | 200K | 150 bytes | ~0.03 GB |
| market_news | 500K | 300 bytes | ~0.15 GB |
| wallet_profiles | 5K | 500 bytes | ~0.003 GB |
| kalshi_market_baselines | 100K | 100 bytes | ~0.01 GB |
| Total (main DB) | ~0.72 GB/year |
Well within SQLite capacity. Archive wallet_trades at 180 days to keep main table under ~450K rows.
WALLET ALERT — COPY SIGNAL
Wallet: 0x1a2b...3c4d (Follow #3, ROI: +34%, Brier: 0.18)
Entity: Wintermute (Arkham)
Market: "BTC above $100K by June 2026"
Action: Bought YES @ $0.42, $15,000 size
Conviction: 3.2x their typical size
Category: Crypto (cat. accuracy: 78%)
Timing: Early (12% of time to settlement) [NEWS: none — genuine alpha]
Strength: HIGH (0.87)
Slippage est: 2.1c | Kalshi fee: 7c | Net edge: +4.9c
Kalshi same event: YES @ $0.39 (3c gap — ARB OPPORTUNITY if net >2c)
---
DIVERGENCE ALERT — CROSS-PLATFORM
Event: "Fed cuts rates in March" [event_pair verified, rules match confirmed]
Polymarket YES: $0.62 (moved +8c in 2h, whale-driven)
Kalshi YES: $0.55 (hasn't moved)
Gross gap: 7 cents
Fees: Kalshi 7c taker + Poly ~2c gas = 9c total
NET EDGE: -2 cents — DO NOT TRADE (below 2c kill threshold)
[For reference only — watch for convergence]
---
CONSENSUS ALERT — SHARP MONEY CONVERGENCE
Market: "ETH above $5K by Q3"
Follow-list wallets in: 4/50 (top 8%)
Sides: 3 YES, 1 NO
Avg entry: YES @ $0.31
Price move since first entry: +3c (still early)
Category: Crypto
Escalated to: Crypto Desk research queue
Note: 1 follow-list wallet on opposite side — possible contrarian signal, review manually
Step 1 — Event matching table (2-3 days) Manual curation of top 50 overlapping markets (elections, Fed rates, BTC price, major sports). Build Levenshtein + deterministic rules. Unblocks everything cross-platform. Validate actual overlap % — if <30%, pivot strategy.
Step 2 — Phase 0 validation: Cross-platform price polling + divergence detector (3-5 days) No wallet tracking needed. Just: fetch Polymarket price, fetch Kalshi price, compare on verified event_pairs, log divergences. Run for 4 weeks. Answer: how many >5c-after-fees divergences per day? How long do they last? This validates or kills the core thesis.
Step 3 — Polymarket CLOB + Gamma API integration (3-5 days)
Basic market data flowing: active markets, prices, volumes, leaderboard wallets. Populates markets table and seeds wallet_profiles with top ~200 leaderboard wallets. REST polling only — no WebSocket yet.
Step 4 — Kalshi API integration (authenticated) (3-5 days) Auth handling, rate limiting, market discovery, order book snapshots, trade tape collection. Combined with step 3, both platforms flow.
Step 5 — Wallet trade history backfill for top 100 leaderboard wallets (5-7 days) Pull full trade history from Gamma API. Populate wallet_trades. Calculate initial P&L, win rate, Brier. First follow/fade candidates. Design as re-runnable incremental batch.
Step 6 — Scoring engine + follow/fade list generation (3-5 days) Implement composite scoring (all 4 shadow configs in parallel). Generate initial lists. Start shadow mode. The 6-month clock starts here.
Step 7 — Polymarket WebSocket integration (5-7 days) Upgrade from 15-minute REST to real-time. WebSocketManager with reconnection, gap backfill, and graceful degradation. Only after REST system is stable.
Step 8 — Alert system (Telegram) (2-3 days) Format and send wallet alerts + divergence alerts. Include fee-aware net edge in every alert. Test that alerts are actionable (not too frequent, easy to parse).
Step 9 — Settlement tracking + system self-scoring (3-5 days) When markets resolve: update wallet P&L, system_predictions outcomes, system-level Brier. Weekly accuracy report.
Step 10 — Kalshi flow signal detection (5-7 days) After 4-6 weeks of Kalshi data (step 4), baselines are established. Enable flow signals starting with volume_spike and price_divergence. Validate Brier before adding more signal types. Defer iceberg detection.
Step 11 — On-chain wallet discovery scanning (5-7 days) Scan CTFE + Neg Risk Adapter for large trades from unknown addresses. Auto-add to candidate pool. Start using Alchemy for reliable eth_getLogs.
Step 12 — v2 features (weeks each, 3-6 months of data required) Wallet similarity clustering, sophisticated MM tracking, consensus detection enhancements, funding chain analysis, full social sentiment pipeline.
| Component | Reusable for |
|---|---|
| Event matching infrastructure (entity resolution) | Sports, crypto, politics, economics, macro desks |
| Kalshi API client (auth, rate limiting, market data) | Economics, weather, politics desks |
| Polymarket API client (CLOB + Gamma + WebSocket) | Crypto, politics, sports desks |
| Brier score calculation + settlement tracking | All desks (standardize as shared module) |
| Telegram alert bot with structured templates | All desks |
| Divergence detection pattern (price gap across sources) | Crypto (CEX vs DEX), sports (sportsbook lines), forex (broker spreads) |
| WebSocket manager (reconnection + gap detection) | Crypto exchanges, forex, sports real-time feeds |
| Sharp money scoring methodology | Sports betting (sharp vs square), crypto (whale wallet tracking) |
| Slippage modeling function (order book depth → fill price) | All desks with execution |
| Macro event calendar | Macro, crypto, sports, options desks |
Realistic if: Build order is followed strictly and v2 features are deferred. First divergence signals in ~2 weeks. First wallet signals in ~5-6 weeks.
Biggest risk: The cross-platform arbitrage edge may not exist at the scale needed. Empirical overlap is ~30-40%. Divergence windows may close in seconds, not minutes. The Phase 0 validation gate (4 weeks, 3 threshold tests) prevents the most expensive failure: spending months building an elaborate system on a false premise.
First to break:
Second to break:
Budget note: Polymarket, Kalshi, Polygon RPC (Alchemy free tier), Telegram bot, Google News RSS — all free. No additional API costs required for this spec. This is within the $115-145/month total budget with room to spare.
Top change: Validate market overlap empirically BEFORE building anything beyond the divergence detector. If <30% overlap, the entire cross-platform strategy pivots to pure wallet intelligence — still highly valuable but different architecture priority.
This section addresses a broader question than wallet intelligence alone: can prediction markets (Kalshi, Polymarket) serve as leading indicators for other trading venues, and can informed/insider trading on prediction markets be detected and exploited?
Core thesis: The informed-trading detection pipeline (on-chain wallet intelligence, already covered in Parts I-II) is more robust and immediately actionable than the cross-venue signal exploitation thesis. The cross-venue thesis is partially valid but far more nuanced and less consistently profitable than it appears.
Bottom line: The on-chain intelligence pipeline should be built first. Cross-venue signal exploitation should be built selectively, starting with the political/macro scenario only.
[PANEL: Gemini 3.1 Pro] The Signal Intelligence Desk (SigInt) should function as an internal API/data provider only — it does not execute trades. It generates standardized JSON alerts consumed by other desks:
{
"signal_type": "whale_activity | pre_news_move | volume_spike | divergence",
"confidence": 0.82,
"asset": "FXI",
"direction": "short",
"source_contract": "US tariff >25% on China by April",
"source_platform": "polymarket",
"predicted_lag_time": "12-48h",
"downstream_venues": ["IBKR", "OANDA"],
"expires_at": "2026-04-01T00:00:00Z"
}
Feasibility: 3/10
The premise is largely backwards for sports. The information flow in sports betting runs from sharp sportsbooks (Pinnacle, Circa, CRIS) to prediction markets, not the other way around. Here is why:
The one exception: When Kalshi lists a contract that sportsbooks don't cover (e.g., a niche stat, a quarter-level prop), information CAN flow from Kalshi outward. But these contracts have even thinner liquidity.
Time lag reality: When a significant injury is announced (say, a QB is ruled out), the repricing sequence typically goes:
The window between Kalshi repricing and DraftKings repricing is inconsistent and often nonexistent because DraftKings also monitors the same news feeds.
Estimated edge: Near zero for automated systems. Occasionally 1-3 cents on a prop that Kalshi reprices first, but the frequency is too low (maybe 2-5 tradeable signals per week across all sports) and the window too short (<60 seconds) to build a reliable desk around.
Example workflow -- showing why it's difficult:
The Kalshi contract moved 10 seconds before DraftKings but DraftKings already had the line moving. You might catch 5 cents on the game total, but only if your order hits before DraftKings' automated line movement. In practice, your fill rate on stale lines is <20%.
[PANEL: GPT-4.1] The real bottleneck for any PM-to-sportsbook edge is soft book account lifespan. Even if the signal is real, soft books (DraftKings, FanDuel) limit and ban winning accounts quickly. The edge is increasingly well-known among arbers, and account longevity -- not signal quality -- is the binding constraint.
Infrastructure needed (if pursued anyway):
wss://trading-api.kalshi.com/trade-api/ws/v2)Feasibility: 5/10
This is more interesting than Scenario 1 because the information asymmetry is real but the edge is indirect and hard to quantify.
Why it has some merit:
The edge mechanism:
Real example:
Estimated edge: 2-5% ROI improvement in GPPs on days with significant prediction market signal divergence. This is meaningful in DFS where long-run ROI of 5-10% is elite. But the signal only fires maybe 3-8 times per week across all major slates.
Infrastructure needed:
Feasibility: 7/10 -- THE BEST SCENARIO IN TOPIC 1
This is where prediction markets genuinely lead. Here's why:
Prediction markets aggregate political intelligence that financial markets price poorly. A tariff announcement's probability might move from 30% to 70% on Polymarket over 48 hours based on DC insider chatter, congressional staff leaks, and lobbyist intelligence. The stock market won't price this in until a news article appears or the announcement happens.
The participants are different. Polymarket political bettors include DC insiders, political operatives, and people with genuine information advantages. Equity/forex traders are mostly quants, institutions, and retail -- they're good at pricing earnings and economic data but bad at pricing political risk.
Liquidity exists on both sides. Polymarket political contracts regularly have $1M-$50M+ in volume. And the downstream assets (SPY, QQQ, sector ETFs, forex majors) have functionally unlimited liquidity.
Specific sub-scenarios:
Feasibility: 6/10. This is the most competitive sub-scenario. CME FedWatch (fed funds futures) already prices this efficiently and has been doing so for 30 years. Polymarket's Fed rate contracts are largely derivative of CME FedWatch. However:
Feasibility: 8/10. This is the strongest sub-scenario. Here is a concrete case study:
Example workflow with real numbers:
Expected edge: 2-5% per trade, 8-15 signals per year that meet the quality threshold. Annual expected return from this sub-desk alone: 15-40% on allocated capital, with significant variance.
Feasibility: 7/10. Well-documented and studied.
Feasibility: 7/10. This is real but legally sensitive.
Feasibility: 4/10. Economic data surprises are extremely hard to predict and prediction markets don't have a clear advantage over the options market (which already prices expected moves via implied volatility).
Infrastructure for Scenario 3 (Political/Macro):
Feasibility: 4/10
Challenge to the premise: Weather prediction markets on Kalshi have extremely thin liquidity ($1K-$50K per contract typically). Energy traders at commodity desks use the same NWS/ECMWF/GFS data that weather prediction market participants use, and they have PhD meteorologists on staff. The information flow is: weather models --> energy desks AND weather markets simultaneously.
The theoretical window exists only when:
Time window: Only realistic during off-hours. Natural gas futures trade nearly 24 hours but are thin overnight. If a weather model update drops at 4 AM ET and Kalshi weather contracts move, the NG open could gap in the expected direction.
Problem: The edge is tiny (maybe 1-3 cents on NG per event) and the frequency is low (a few times per heating/cooling season). Energy markets are sophisticated; this isn't a consistent edge.
Recommendation: Do not prioritize this.
Feasibility: 2/10
This doesn't work. Here's why:
[PANEL: Grok 4 Fast] Dissent on weekends (rated 7/10): During weekend low-liquidity windows, Polymarket BTC/crypto contracts can lead spot by 10-30 minutes. Example: Dec 31, 2023 (weekend), Polymarket "BTC >$45K by Jan 1" spiked to 75c at 11 PM UTC on whale bet. BTC spot lagged from $42.5K to $43.2K by 11:15 PM. This is a niche edge -- viable for weekends only, with Polymarket liquidity too low for reliable signals, but worth monitoring as a sentiment indicator for weekend crypto positioning.
Recommendation: Do not build this as a standalone desk. Monitor as part of the broader wallet intelligence system.
Scenario 1 (sports) is not worth building. The information flow runs the wrong direction. Use Pinnacle as the oracle (which we already do) and forget about Kalshi-to-sportsbook signals.
Scenario 5 (crypto) is a dead end. Crypto prediction markets are derivatives of spot, not information sources. The answer to "do they ever lead spot?" is functionally never in a tradeable way (weekend dissent noted above).
The overall thesis of Topic 1 is weaker than it appears. Prediction markets lead other venues ONLY for political/regulatory events, where the prediction market participant base has genuinely different information than financial market participants. For sports, weather, and crypto, the participant bases overlap too much.
[PANEL: GPT-4.1] If the cross-venue edge proves too fleeting (especially on sports/crypto), use PM signals for model validation and risk management rather than direct trading. A prediction market probability shift can serve as a sanity check on internal models, even if the trading window is too short to exploit directly.
This is the more promising direction. The on-chain transparency of Polymarket creates a genuine, unique data advantage. This topic is undersold relative to Topic 1. Most quant funds and retail traders are not doing this yet. The data is free and public. This should be the primary focus.
Note: Much of the wallet profiling infrastructure is already specified in Parts I-II of this document. This section focuses on the broader detection methods and the tradeable edge they create.
Approach: Build a system that:
Defining "significant movement":
Defining "news timestamp" (recommended hierarchy):
Use the wire service timestamp as ground truth. If the prediction market moved >8c before the wire service timestamp, flag it.
False positive rate (based on studies of Polymarket data from 2024-2025):
To reduce false positives:
Real examples:
2024 Biden withdrawal (July 21, 2024): Polymarket "Biden Democratic nominee" dropped from ~85c to ~70c over the 48 hours before the official announcement. Multiple large sells appeared from previously inactive wallets. The NYT didn't report imminent withdrawal until ~12 hours before the announcement. Signal preceded mainstream news by 36+ hours.
2024 Trump VP pick: Polymarket "JD Vance VP nominee" spiked from ~30c to ~55c approximately 6 hours before the announcement. Whale wallets placed $100K+ in coordinated buys.
Supreme Court decisions: Polymarket contracts on major SCOTUS rulings have shown pre-announcement movement in the correct direction in roughly 60% of cases, versus 50% expected by chance. The window is typically 1-4 hours before the opinion is released.
This method is extensively covered in Parts I-II of this document. The following adds the scoring system and data access specifics relevant to the prediction market signal exploitation context.
This is the highest-value intelligence product in the entire system.
Data access:
0x4D97DCd97eC945f40cF65F87097ACe5EA0476045 (CTF Exchange)Wallet scoring system (composite, 0-100):
| Metric | Weight | Threshold for "suspicious" |
|---|---|---|
| Win rate | 25% | >65% over 20+ resolved bets |
| Pre-news timing | 25% | >30% of bets placed <6h before relevant news |
| Category concentration | 15% | >80% of bets in one category |
| Wallet age at first large bet | 10% | <7 days old with first bet >$5K |
| Cluster membership | 15% | Linked to 3+ wallets via funding source or timing patterns |
| Profit/loss ratio | 10% | >3:1 P/L on resolved positions |
Thresholds:
[PANEL: Grok 4 Fast] Bayesian scoring model formula: Score = (win_rate * volume) + (pre_news_bets * 2) - (diversification_penalty). Threshold >0.7 for "informed." Build via Python with NetworkX for cluster analysis.
Cluster detection (critical because informed traders create fresh wallets):
[PANEL: Gemini 3.1 Pro] Market maker wallet exclusion: Identify and filter out known market makers (Wintermute, etc.) to improve signal quality. Market makers show symmetric trading (buy and sell), tight spread placement, high volume but low directional PnL. An A+ system would monitor Wintermute's wallets across all chains to detect liquidity pulls before PM prices move.
Implementation tables:
- wallets (address, first_seen, last_active, total_volume, win_rate, score)
- trades (tx_hash, wallet, contract_id, side, amount, price, timestamp)
- wallet_clusters (cluster_id, wallet_address, link_type, confidence)
- contract_events (contract_id, event_type, event_time, news_time, pre_news_flag)
- alerts (wallet_or_cluster, contract_id, signal_type, confidence, timestamp)
Polygon RPC calls needed:
eth_getLogs with CTF Exchange contract address, filtered by OrderFilled events/markets endpoint)Polling frequency: Every block (~2 seconds on Polygon). In practice, batch every 30 seconds to stay within free RPC tier limits.
[PANEL: Gemini 3.1 Pro] Mempool sniping: To truly win on Polymarket copy-trading, you cannot wait for the transaction to be confirmed on the Polygon blockchain. Reading the mempool (pending transactions) lets you see a whale's trade before it confirms, and pay a higher gas fee to get your transaction mined first, effectively front-running the insider. This is technically possible but ethically and legally aggressive -- treat as v2/v3 feature if ever pursued.
Feasibility: 5/10
What Kalshi's API exposes:
/markets/{ticker}/orderbook -- current bids and asks with depth/markets/{ticker}/trades -- recent trades (but anonymized, no trader IDs)What you CAN detect:
What you CANNOT detect:
This makes Kalshi a weaker signal source than Polymarket. You can detect THAT informed activity is happening but not WHO is doing it, so you can't build persistent profiles.
Still useful as a confirmation signal: If Polymarket whale wallet + Kalshi volume spike align on the same event, confidence in the signal doubles.
Decision tree when informed activity is detected:
Is the downstream asset liquid enough to trade?
How confident is the signal?
How fast do we need to act?
Which venue to trade?
Realistic win rate and edge (based on analysis of Polymarket whale wallet behavior in 2024-2025):
Annualized expectation: With 50-100 tradeable signals per year and 2% average bankroll per trade at 63% win rate with 2:1 avg win/loss, this is a ~25-50% annual return on allocated capital. High variance. Expect drawdowns of 10-15% within any given quarter.
[PANEL: Gemini 3.1 Pro] Wallet poisoning risk: Whales could fake positions to trigger copy-traders, then dump. A known whale might buy a small position to pump the price via copy-traders following them, then exit their real position against the inflated price. Mitigation: only trigger on large size relative to the whale's history, and track the whale's exits in real-time. If a followed whale exits within 2 hours of entry, flag as potential poison.
[PANEL: Gemini 3.1 Pro] MEV (Maximal Extractable Value) bot risk: On-chain copy-trades can be front-run by MEV bots that see your pending transaction in the mempool. Mitigation: use private RPC endpoints (e.g., Flashbots for Polygon) to submit transactions that bypass the public mempool.
Legal considerations:
Extracted detection rules from case studies:
| Source | Method | Frequency | Cost |
|---|---|---|---|
| Polymarket trades | Polygon RPC eth_getLogs |
Every 30 seconds | $0 (Alchemy free tier) |
| Polymarket markets | REST API /markets |
Every 5 minutes | $0 |
| Polymarket prices | REST API /prices |
Every 60 seconds | $0 |
| Kalshi order book | WebSocket feed | Real-time | $0 (with account) |
| Kalshi trades | REST API /markets/{ticker}/trades |
Every 60 seconds | $0 |
| News timestamps | NewsAPI.org or Mediastack | Every 60 seconds | $0-50/mo |
| Twitter/X first mentions | X API v2 filtered stream | Real-time | $100/mo (Basic tier) |
-- Prediction Market Signal Intelligence tables
-- (extends the wallet intelligence schema from Parts I-II)
CREATE TABLE pm_contracts (
id TEXT PRIMARY KEY, -- Polymarket condition_id or Kalshi ticker
platform TEXT, -- 'polymarket' or 'kalshi'
title TEXT,
category TEXT, -- 'politics', 'sports', 'crypto', 'weather', 'finance', 'fda'
resolution_date TEXT,
resolved_outcome TEXT,
created_at TEXT
);
CREATE TABLE pm_prices (
contract_id TEXT,
timestamp INTEGER, -- unix epoch
yes_price REAL,
no_price REAL,
volume_24h REAL,
PRIMARY KEY (contract_id, timestamp)
);
CREATE TABLE pm_trades_onchain (
tx_hash TEXT PRIMARY KEY,
block_number INTEGER,
timestamp INTEGER,
wallet TEXT,
contract_id TEXT,
side TEXT, -- 'YES' or 'NO'
amount REAL, -- in USDC
price REAL
);
CREATE TABLE pm_wallets (
address TEXT PRIMARY KEY,
first_seen INTEGER,
last_active INTEGER,
total_volume REAL,
total_trades INTEGER,
win_rate REAL,
score REAL, -- 0-100 composite
cluster_id TEXT,
categories TEXT, -- JSON array of categories traded
updated_at INTEGER
);
CREATE TABLE pm_wallet_clusters (
cluster_id TEXT,
wallet TEXT,
link_type TEXT, -- 'funding_source', 'timing_correlation', 'gas_pattern'
confidence REAL,
PRIMARY KEY (cluster_id, wallet)
);
CREATE TABLE pm_news_events (
id INTEGER PRIMARY KEY,
contract_id TEXT,
event_description TEXT,
wire_timestamp INTEGER, -- first wire service mention
social_timestamp INTEGER, -- first tweet
article_timestamp INTEGER, -- first article
pre_news_move_detected INTEGER -- boolean
);
CREATE TABLE pm_signals (
id INTEGER PRIMARY KEY,
signal_type TEXT, -- 'whale_activity', 'cluster_buy', 'pre_news_move', 'volume_spike'
contract_id TEXT,
wallet_or_cluster TEXT,
confidence REAL,
detected_at INTEGER,
downstream_assets TEXT, -- JSON array of tickers
action_taken TEXT,
outcome TEXT,
pnl REAL
);
CREATE TABLE pm_downstream_map (
contract_pattern TEXT, -- regex or keyword matching PM contract titles
category TEXT,
downstream_ticker TEXT, -- e.g., 'FXI', 'XLE', 'USD/CNH'
downstream_venue TEXT, -- 'IBKR', 'OANDA', 'Kalshi', 'Polymarket'
direction TEXT, -- 'long' or 'short' relative to YES outcome
notes TEXT,
PRIMARY KEY (contract_pattern, downstream_ticker)
);
| Detection | Mode | Latency Target | Reason |
|---|---|---|---|
| Whale trade alert | Real-time | <60 seconds | Need to trade before market reprices |
| Volume spike | Real-time | <60 seconds | Same |
| Pre-news correlation | Batch (hourly) | ~1 hour | Retrospective analysis |
| Wallet scoring update | Batch (daily) | ~24 hours | Doesn't change rapidly |
| Cluster detection | Batch (daily) | ~24 hours | Computationally intensive |
| Win rate calculation | Batch (on resolution) | N/A | Only possible after contract resolves |
Alerts go to the boss's Telegram via the existing bridge infrastructure. Alert format:
SIGNAL: Whale Activity Detected
Contract: "Trump wins 2028 election" (Polymarket)
Wallet: 0xabc...def (Score: 78, Win Rate: 71%)
Action: Bought $45K YES at 42c (5 min ago)
Cluster: Part of 3-wallet cluster (avg score: 72)
Volume: 8x normal hourly volume
Downstream: Consider long DJT, short renewable ETFs
Confidence: HIGH
Priority levels:
| Scenario | Feasibility (1-10) | Edge Size | Frequency | Priority |
|---|---|---|---|---|
| T1-S1: Prediction markets --> Sportsbooks | 3 | Near zero | Low | Skip |
| T1-S2: Prediction markets --> DFS | 5 | 2-5% ROI lift | 3-8/week | Medium |
| T1-S3: Political/Regulatory --> Stocks/Forex | 7 | 2-5% per trade | 8-15/year | HIGH |
| T1-S4: Weather --> Energy | 4 | Tiny | Rare | Skip |
| T1-S5: Crypto PM --> Spot | 2 | None reliable | N/A | Skip |
| T2-M1: Pre-news detection | 7 | Diagnostic (not direct) | Ongoing | HIGH |
| T2-M2: Whale wallet profiling | 9 | 8-15c per signal | 50-100/year | HIGHEST |
| T2-M3: Kalshi order book | 5 | Confirmation signal | Ongoing | Medium |
| T2-M4: Following informed money | 8 | 25-50% annual | Ongoing | HIGH |
Polymarket Whale Wallet Profiling + Cluster Detection (T2-M2). This is the foundation. Without wallet intelligence, you can't follow informed money. Build the on-chain data pipeline, wallet scoring, and cluster detection first. Estimated build time: 2-3 weeks. Cost: $0-50/mo (free RPC tier + Dune free).
Political/Regulatory Signal Pipeline (T1-S3). Map prediction market contracts to downstream financial instruments. Build the signal detector for tariff, election, and FDA contracts. Wire it to IBKR for equity trades and OANDA for forex. Estimated build time: 1-2 weeks after the wallet system. Cost: $0-200/mo.
Informed Money Following System (T2-M4). This is the trading layer on top of #1 and #2. When whale wallet intelligence identifies a high-confidence signal, generate trade alerts and (optionally) auto-execute. Estimated build time: 1 week after #1 and #2. Cost: brokerage minimums only.
[PANEL: Grok 4 Fast] Cost-benefit ROI framework: Infrastructure runs approximately $3K/month all-in (APIs, compute, data feeds). Expected annualized profit from the top 3 priorities: ~$50K/year on moderate capital allocation. The system pays for itself within 1 month if signal quality matches backtested expectations. Paper-trade for 3 months to validate before committing real capital.
| Item | Monthly Cost |
|---|---|
| Polygon RPC (Alchemy free tier) | $0 |
| Dune Analytics (free or Pro) | $0-349 |
| News API | $0-50 |
| X API (Basic) | $100 |
| IBKR account | $0 (no minimums) |
| OANDA (already have) | $0 |
| Kalshi account (already have) | $0 |
| Polymarket monitoring (no account needed for on-chain data) | $0 |
| Total | $100-499/mo |
One-time: $2K for securities attorney consultation on downstream trading legality.
| Risk | Severity | Mitigation |
|---|---|---|
| False signals (noise trades flagged as informed) | HIGH | Require multiple confirmation signals; backtest against historical data before live trading |
| Legal/regulatory (SEC scrutiny on informed-following) | MEDIUM | Attorney opinion letter; document that all data sources are public |
| Platform bans (Kalshi/Polymarket ban scraping) | LOW | Use official APIs only; stay within rate limits |
| Edge decay (others build similar systems) | MEDIUM | First-mover advantage is real; Polymarket on-chain data is underutilized today |
| Model risk (wallet scores are wrong) | MEDIUM | Paper-trade for 3 months before allocating real capital |
| Execution risk (can't get fills fast enough) | LOW-MEDIUM | Trade downstream assets (infinite liquidity) rather than prediction market contracts |
| [PANEL: Gemini 3.1 Pro] Wallet poisoning (whales fake positions to bait copy-traders) | MEDIUM | Only trigger on large size; track whale exits; if exit <2h after entry, flag as poison |
| [PANEL: Gemini 3.1 Pro] MEV front-running on copy-trades | MEDIUM | Use private RPC endpoints (Flashbots for Polygon); avoid public mempool |
New desk: "Signal Intelligence Desk" (SigInt). Here's why:
Structure:
Signal Intelligence Desk (SigInt)
|-- On-Chain Module (Polymarket wallet tracking, cluster detection)
|-- Order Book Module (Kalshi depth, volume, spread analysis)
|-- News Correlation Module (pre-news detection, timestamp matching)
|-- Signal Generator (combines modules, scores confidence, generates alerts)
|-- Downstream Mapper (maps signals to tradeable instruments)
Consumer desks:
| Desk | Signal Received | Action |
|---|---|---|
| Kalshi Edge Bot | Informed wallet bought YES on sports contract at 55c | Increase confidence in that side; adjust position sizing up |
| Phoenix Forex Bot | Tariff probability jumped 20 points on Polymarket | Trigger USD/CNH review; potentially enter short CNH |
| Sports (manual/DFS) | Player prop volume spike on Kalshi + whale sell on Polymarket | Fade player in DFS; adjust prop model |
| Equity (manual initially) | FDA approval whale cluster detected | Alert boss; consider biotech stock position |
| Crypto Desk | Weekend whale activity on BTC threshold contract | Sentiment indicator for spot/futures positioning |
Polymarket market maker profiling. Not all active wallets are directional bettors. Some are market makers. Identifying and EXCLUDING market makers from whale detection improves signal quality dramatically. Market makers show: symmetric trading (buy and sell), tight spread placement, high volume but low directional PnL.
Reverse signal: when informed traders are WRONG. If a high-score wallet loses a bet, that's also information. It means their source was wrong, or the situation changed. Tracking informed-trader losses helps calibrate confidence weights.
Polymarket liquidity as a sentiment indicator. When total liquidity on a contract increases sharply (both sides), it often precedes a major event. This is distinct from directional flow -- it's a volatility signal. You could trade options straddles on downstream assets when prediction market liquidity surges.
Cross-platform arbitrage between Polymarket and Kalshi. When the same event is priced differently on both platforms, the platform with the informed flow (usually Polymarket for political events) is more likely correct. This is both a direct arbitrage (buy cheap side on one, sell expensive side on other) and a signal for downstream trading.
International prediction markets. Metaculus, Manifold, and international betting exchanges (Betfair, Smarkets) also have informed flow. Betfair in particular has enormous liquidity on UK/European political events and publishes full order book data via API. This extends the signal surface beyond just Kalshi/Polymarket.
Prediction Market Signal Intelligence section reviewed by 4-model panel on Mar 15, 2026:
Sonar Reasoning Pro — Skipped (model could not answer the prompt).
Base spec reviewed by 4-model panel on Mar 14, 2026:
Merged final produced by: Claude Code (Sonnet 4.6), 2026-03-15