Panel Ruling: Analyst Briefing Design — Blind Analysis Protocol

Date: 2026-03-30 Panel: Opus 4.6 (native, max reasoning) · Sonnet 4.6 · Gemini 3.1 Pro Preview · Grok 4.2 Reasoning Version: 2 (updated with no-pass constraint + boss clarifications) Verdict: UNANIMOUS on core questions, boss override on thresholds + steam


Context

The desk analyst pipeline uses two independent AI analysts (Sonnet 4.6 + Gemini 3.1 Pro) to produce independent lines, probabilities, and analysis for sports betting markets on Kalshi. The edge scanner (pure math engine) finds mathematical edges by de-vigging Pinnacle lines and comparing against Kalshi prices. The analysts act as independent oddsmakers — they set their own lines from scratch using fundamental data only. Opus sees everything at the end and makes the final call.

We are in model-building mode. No real money is at risk. Every market gets a pick from every layer. We are tracking performance to measure accuracy, calibration, and model improvement over time.


Pipeline Architecture

1. DATA COLLECTION — free scrapers collect hard facts daily
2. RESEARCH — Grok + Sonar Pro (parallel, web search + deep research)
3. VALIDATION — 5 rule-based checks + Flash Lite contradiction detection
4. BLIND ANALYSIS — Sonnet 4.6 + Gemini 3.1 Pro (independent, cold, no memory)
   - Each sets own lines (spread, total, ML, player props) from scratch
   - Each provides implied probabilities + evidence + conviction
   - Neither sees market prices, each other, or prior sessions
   - System builds probability curves from each analyst output
5. COMPARISON LAYER — analyst curves vs edge scanner math vs Kalshi vs Pinnacle
6. OPUS VERDICT — sees EVERYTHING, makes final pick + position sizing on every market
7. POST-VERDICT — store predictions, track curves, settlement, calibration

Ruling 1: Complete Price Blackout for Analysts

UNANIMOUS + Boss confirmed: Hide ALL prices and lines. No exceptions.

Analysts see NONE of the following:

The analysts do not see what the contract thresholds are. They are not told "price BOS -7.5" — they are told "set your own spread for Celtics vs 76ers." They build the line from scratch.

Rationale: Anchoring is not a risk — it is a certainty. AI models anchor on any number shown to them. The spread IS a price. Even showing "-6.5" without odds tells the analyst the market thinks Boston wins by ~7. The entire value of the analyst layer is independence. If they adjust from market prices, we are paying for a redundant signal.


Ruling 2: Sharp Money / Steam — EXCLUDED ENTIRELY

Boss override: No sharp money, no steam, no line movement at the analyst level.

Sharp money, line movement direction, cash/ticket divergence, reverse line movement — all excluded from analyst briefings entirely. This data goes directly to the Opus layer where it serves as a confirming/disconfirming signal alongside the analyst reports.


Ruling 3: Analyst Briefing Contents — Fundamentals Only

The analyst briefing contains ONLY raw performance data and situational facts. Nothing market-derived.

A. Game Context

B. Team Performance Metrics

C. Situational Factors

D. Personnel

E. Referee Data

F. Market Assignment

EXPLICITLY EXCLUDED:

Guiding principle: The analyst sees what an oddsmaker would have on day one — raw data, no market reference.


Ruling 4: Third-Party Model Predictions — HIDDEN

Unanimous: Do not show Sagarin, DRatings, Dimers, ESPN BPI, Massey, GameSim, or Prediction Tracker to analysts.

These are other models' opinions. Showing them creates anchoring and turns analysts into aggregators rather than independent oddsmakers.

Valid use: Collect and track against outcomes over 2-3 months. If a model consistently beats Pinnacle implied probabilities, integrate that signal at the Opus layer — never at the analyst layer.


Ruling 5: Analyst Output — What They Must Produce

Every analyst produces the following for EVERY game assigned:

{
  "game_id": "2026-03-30_BOS_PHI",
  "analyst": "Sonnet-4.6",
  "timestamp": "2026-03-30T14:23:00Z",

  "game_summary": "2-3 sentence matchup assessment from fundamentals",

  "spread_analysis": {
    "predicted_winner": "BOS",
    "predicted_margin": 7.0,
    "confidence_band": [4.5, 9.5],
    "implied_probability_curve": {
      "win_by_1_plus": 0.74,
      "win_by_4_plus": 0.62,
      "win_by_7_plus": 0.48,
      "win_by_10_plus": 0.33,
      "win_by_14_plus": 0.18,
      "win_by_20_plus": 0.06
    },
    "conviction": 4,
    "conviction_reasoning": "Boston 5.2 net rating advantage amplified at home. PHI missing Embiid — team is -4.3 net rating without him this season.",
    "evidence": [
      "BOS net rating +5.2 vs PHI -1.3 = 6.5 point fundamental gap",
      "BOS home court adds +3.1 to net rating historically",
      "PHI 12-18 without Embiid, -4.3 net rating differential",
      "Referee crew averages 2.1 more fouls/48 — deeper BOS roster benefits"
    ]
  },

  "total_analysis": {
    "predicted_total": 219.0,
    "confidence_band": [213.0, 225.0],
    "implied_probability_curve": {
      "over_205": 0.91,
      "over_210": 0.80,
      "over_215": 0.65,
      "over_220": 0.45,
      "over_225": 0.28,
      "over_230": 0.13
    },
    "conviction": 3,
    "conviction_reasoning": "Both teams near league average pace. No strong factors pushing over or under.",
    "evidence": [
      "BOS pace 99.2, PHI pace 98.8 — both middle of pack",
      "BOS scores 114.2 PPG at home, allows 107.1",
      "PHI scores 108.5 on road, allows 112.3"
    ]
  },

  "moneyline_analysis": {
    "predicted_winner": "BOS",
    "win_probability": 0.72,
    "opponent_win_probability": 0.28,
    "conviction": 4,
    "conviction_reasoning": "Clear talent + situational advantage.",
    "evidence": [
      "Net rating differential + home court + injury advantage",
      "BOS 28-8 at home this season"
    ]
  },

  "upset_scenario": "Philadelphia wins if Maxey scores 35+ and BOS shoots below 33% from three. PHI transition offense without Embiid is actually faster. BOS complacency in a seemingly easy matchup is the risk.",

  "key_uncertainties": [
    "If Embiid is upgraded to active, margin estimate drops to 3-4",
    "BOS has been coasting in late season — effort level uncertain"
  ],

  "data_gaps": [
    "No recent PHI practice reports available",
    "Unsure of BOS rotation plans with playoffs approaching"
  ]
}

Schema Rules:


Ruling 6: Conviction Calibration — 1-5 Scale

Scale:

Level Label Meaning
1 VERY LOW Genuine uncertainty. Making a pick but minimal fundamental basis. Near-random but tracked.
2 LOW Weak basis. One or two data points, no strong directional story.
3 MODERATE Reasonable basis. Several data points align. Standard operating pick.
4 HIGH Strong basis. Multiple independent factors converge. Clear matchup story.
5 VERY HIGH Exceptional. Rare clarity — overwhelming convergence of factors. Use sparingly (<10% of picks).

Rules:


Ruling 7: Probability Curves — Model Building

Each analyst produces an implied probability curve for spreads and totals. The system:

  1. Builds a full probability curve from the analyst's central estimate + confidence band
  2. Compares curves: Analyst A curve vs Analyst B curve vs Edge Scanner curve vs Pinnacle implied vs Kalshi prices
  3. Tracks curve accuracy over time using Brier scores at each threshold
  4. Identifies which curve is best per sport, per market type, per analyst
  5. Evolves weights — after 200+ markets, shift composite weights toward the best-performing curve

This is the core model-building mechanism. Over time, we learn: is Sonnet better at spreads? Is Gemini better at totals? Does the edge scanner beat both on ML? The clean independence makes this measurement meaningful.


Ruling 8: Opus Verdict Layer — Sees Everything

Opus receives for every market:

  1. Analyst A full report — lines, probabilities, curves, conviction, evidence, upset scenario
  2. Analyst B full report — independent, same format
  3. Edge scanner output — mathematical probability, raw/net edge, steam detection, tail confidence
  4. Kalshi prices — current contract prices, bid/ask, volume
  5. Pinnacle lines — the sharp benchmark
  6. Third-party model consensus — Sagarin, DRatings, etc. (analysts never saw this)
  7. Sharp money data — line movement, cash/ticket splits (analysts never saw this)

Opus Output (every market, no passing):

{
  "market": "BOS vs PHI — Spread",
  "final_line": "BOS -7.5",
  "final_probability": 0.51,
  "final_conviction": 4,
  "position_size": "FULL",
  "convergence_level": "FULL",
  "verdict_narrative": "Both analysts see BOS -7 to -8.5. Edge scanner math at -7.1. Pinnacle at -6.5. Analysts + math agree market is underpricing BOS margin. Sharp money confirms direction. Full position.",
  "analyst_agreement": true,
  "analyst_vs_market_divergence": 1.5,
  "tracking_flags": {
    "high_conviction_both": true,
    "contrarian_signal": false,
    "steam_aligned": true
  }
}

Position Sizing:

Signal Pattern Size
Both analysts + math agree, market diverges FULL
One analyst + math agree, other close THREE_QUARTER
Analysts agree, math differs HALF
Math only, analysts neutral QUARTER
Everyone disagrees (still pick, edge scanner tiebreaks) MINIMUM

MINIMUM is still a position. No passing. Every market gets a pick and a size. Minimum positions generate calibration data on low-confidence scenarios.


Ruling 9: Prompt Engineering Requirements

The analyst system prompt MUST include:

  1. Independence framing: "You are an independent oddsmaker. You have NO access to current market prices. Build your lines entirely from the fundamental data provided. It is valuable for your line to differ significantly from markets — that divergence is the entire point of your role."

  2. No-pass rule: "You MUST set a line and probability for every market type assigned. Low conviction is fine — it is still a pick with a number. There is no passing, no holding, no skipping."

  3. Temporal isolation: Never show previous session outputs. Each session is fully independent.

  4. Anti-inflation: "Most conviction ratings should be 2s and 3s. If you rate most picks 4+, you are miscalibrated. Be honest about uncertainty."

  5. Evidence requirement: "Every claim must be backed by specific data from the briefing. Do not make assertions without evidence."


Ruling 10: Performance Tracking

Track the following over time:

After 60-90 sessions, use this data to:


Summary

Decision Ruling
Price hiding Complete blackout — no lines, odds, probabilities, thresholds from any source
Sharp money / steam Excluded entirely from analyst layer
Briefing contents Raw fundamentals only — stats, rest, refs, injuries, matchup data
Third-party models Hidden from analysts — tracked for correlation research only
Assignment format Game + market types only — "set your own spread, total, ML" — no thresholds shown
Analyst output Own lines + implied probability curves + evidence + conviction + upset scenario
Conviction 1-5 scale, per-market-type, mandatory reasoning, base rate enforcement
Probability curves Built from analyst output, tracked against reality, compared across analysts
Opus verdict Sees everything — analyst reports + Kalshi + Pinnacle + sharp money + models. Picks every market.
Position sizing FULL to MINIMUM based on convergence — no passing
No-pass rule Every layer picks every market. Low conviction = still a pick.

The architecture's value is independence at the analyst layer. Protect it aggressively. The probability curves built from blind analysis are the product — everything else is infrastructure to make those curves better over time.


Ruling 11: Analyst Model Roster — LOCKED

Date added: 2026-03-30 Status: HARD RULE — no substitutions

The blind analyst panel consists of exactly these three models. No substitutions, no swaps, no additions without a new panel ruling.

Seat Model Provider API
Analyst 1 Claude Sonnet 4.6 (claude-sonnet-4-6) Anthropic Direct API
Analyst 2 Gemini 3.1 Pro Preview (google/gemini-3.1-pro-preview) Google OpenRouter
Analyst 3 gpt-oss-120b (openai/gpt-oss-120b) OpenAI (open-weight) OpenRouter

Verdict layer: Opus 4.6 via CLI bridge (free). NOT a blind analyst.

Why these three:

Rules:


Ruling 12: Opus Verdict Layer — HYBRID (Fresh + Calibration Digest)

Date added: 2026-03-31 Council: Full 5-member council, unanimous Status: LOCKED

Opus goes FRESH every session. No resume. No memory of past verdicts.

Before each session, Opus receives a Verdict Calibration Digest — aggregate pipeline statistics only.

Digest Contains:

Digest Never Contains:

Rules:

Rationale:

Resume introduces anchoring, path dependence, and context pollution — every advisor flagged these as fatal. Pure fresh wastes real signal about pipeline performance. Hybrid gives Opus calibration without contamination.

Source: ~/edgeclaw/results/panel-results/analyst-briefing-ruling.md