File size: 7,083 Bytes
4a2ce27 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | ---
license: other
license_name: cc-by-4-0-with-sf-attribution
license_link: LICENSE
language:
- en
library_name: lightgbm
tags:
- prediction-markets
- forecasting
- calibration
- brier
- tabular-classification
- gradient-boosting
- lightgbm
- xgboost
- catboost
- time-series
- kalshi
- polymarket
- simplefunctions
pipeline_tag: tabular-classification
model-index:
- name: sf-ml-baseline-v0.1
results:
- task:
type: tabular-classification
name: 24h direction forecast (V1 x T1)
metrics:
- type: brier
value: 0.2294
name: Brier score
- type: brier_improvement
value: 0.0206
name: Improvement vs coinflip baseline (0.2500)
- task:
type: tabular-classification
name: 24h resolution forecast (V2 x T4)
metrics:
- type: brier
value: 0.1681
name: Brier score (XGBoost)
- type: brier_improvement
value: 0.0086
name: Improvement vs market-price/100 baseline (0.1767)
---
# sf-ml-baseline v0.1
**What it is**: gradient-boosted tree ensembles that predict prediction-market outcomes from engineered microstructure features.
**Published**: 2026-04-19 (initial, time-capsule β see "Retrain plan" below).
**Trained on**: 11 days of SimpleFunctions (`market_indicator_history` + `marketwide_resolutions`) β 2026-04-08 β 2026-04-19.
**License**: CC-BY-4.0 with SimpleFunctions attribution β see `LICENSE`.
**Author**: SimpleFunctions β https://simplefunctions.dev
**Model repo**: https://huggingface.co/SimpleFunctions/sf-ml-baseline *(pending upload)*
## Why release this
Nobody has published a calibrated feature-based baseline for prediction-market forecasting. All prior art (Halawi 2024, Schoenegger 2024, AIA 2025) uses LLM + news retrieval. We release this as the **feature-based reference** that LLM systems should ensemble with.
**Brier scores (vs market-implied baseline, 95% CI):**
| Task | Model | Brier | CI | Ξ vs baseline |
|------|-------|------:|---|---:|
| V1 Γ T1: direction 24h | LGBM 3-seed | 0.2295 | [0.2290, 0.2299] | **β0.0205** (vs coinflip 0.2500) |
| V1 Γ T1: direction 24h | XGBoost 3-seed | 0.2296 | [0.2292, 0.2301] | β0.0204 |
| V1 Γ T1: direction 24h | CatBoost 3-seed | 0.2295 | [0.2290, 0.2299] | β0.0205 |
| **V1 Γ T1: direction 24h** | **Ensemble (3-model Γ 3-seed = 9)** | **0.2294** | [0.2289, 0.2299] | **β0.0206** |
| V2 Γ T4: resolution 24h | XGBoost 3-seed | 0.1681 | [0.1605, 0.1759] | β0.0086 (vs price/100 = 0.1767) |
Statistically significant (non-overlapping 95% CI) on V1 Γ T1 at 246,862 test samples.
## Install
```bash
pip install lightgbm xgboost catboost numpy pandas
```
## Use
```python
from pathlib import Path
from sf_ml_baseline import SFBaseline
model = SFBaseline(weights_dir='sf-ml-baseline/weights')
# Direction forecast: probability that the 24h-forward price will be HIGHER than now.
# Features: current price (cents, 0-100), 24h price delta (cents),
# implicit yield (%), calibration ratio index (unitless), calibration variability ratio.
p_up = model.predict_direction(price_cents=55, delta_cents=3, iy=12.5, cri=0.6, cvr=0.8)
print(f'P(price rises in next 24h) = {p_up:.3f}')
# Or batch-predict from a DataFrame:
import pandas as pd
df = pd.DataFrame([
{'price_cents': 55, 'delta_cents': 3, 'iy': 12.5, 'cri': 0.6, 'cvr': 0.8},
{'price_cents': 82, 'delta_cents': -1, 'iy': 4.5, 'cri': 0.3, 'cvr': 0.9},
])
probas = model.predict_direction_batch(df)
```
See `predict.py` for the full inference code.
## Architecture
- **Features (V1)**: `price_cents`, `delta_cents` (24h price change), `iy` (implicit yield), `cri` (calibration ratio index), `cvr` (calibration variability ratio). Spec: SimpleFunctions [indicator documentation](https://simplefunctions.dev/concepts).
- **Models**: 3 LightGBM + 3 XGBoost + 3 CatBoost, each trained with different seeds. Ensemble predictions by simple mean.
- **Split**: temporal β 80% train / 24h embargo / 20% test. 90/10 inner train/val split for early stopping.
- **Target T1**: binary `sign(price(t+24h) - price(t))`, excludes no-move rows (delta==0 at t+24h).
- **Target T4**: `resolved_outcome` β {0, 1} for markets that resolved in 22-26h after the feature capture time.
## Known limitations
1. **Only 11 days of training data.** The full feature history table (`market_indicator_history`) was introduced to SF's data pipeline on 2026-04-08. A proper 30d+ / 180d+ re-train is scheduled; see "Retrain plan" below.
2. **5 base features only.** `market_indicator_history` holds a compact subset of the full indicator stack. The live `market_indicators` table has ~20 features (`iyYes`, `iyNo`, `ee`, `las`, `vr`, `iar`, `rv`, `adjIy`, `cvrDelta`, `overround`, etc.) but only for the current snapshot. Future versions will store history for the full feature set.
3. **`cvr` has 0% feature importance in the direction model.** Investigate whether the window/computation needs tuning.
4. **V2 Γ T4 (rolling features + resolution label) is below the 0.01 Brier gate globally** β works well on Crypto (Ξ=β0.041) and Commodities (Ξ=β0.036) but not Sports (Ξ=β0.006) or Financials (+0.004, model worse).
5. **Do not use for live trading without backtesting against your own execution model.** This is a calibration baseline, not a PnL strategy.
## Retrain plan
**v0.1 is a time-capsule.** Planned retrains:
| Version | Trigger | When | What changes |
|---------|---------|------|------|
| v0.2 | R2 dump archive β₯ 30d of indicator history | ~2026-05-20 | Same architecture, more data; per-category specialist models for Crypto/Commodities/Sports |
| v0.3 | Full indicator feature set stored in history (schema change) | TBD | V1 grows from 5 β 20 features; re-run Phase A.3 full grid |
| v1.0 | β₯ 6 months of R2 data | ~2026-10 | FT-Transformer + TabPFN + ensemble; formal paper submission (ICLR 2027 FinAI Workshop) |
## Reproduce
```bash
git clone https://github.com/spfunctions/simplefunctions-landing # (private β OSS mirror pending)
cd simplefunctions-landing
source scripts/ml/.venv/bin/activate
# Data pull (uses DIRECT_DATABASE_URL in .env.local)
python scripts/ml/phase-a/01-pull-training-data.py
# Train
python scripts/ml/phase-a/02-train-lgbm.py
python scripts/ml/phase-a/04-bakeoff.py
# Evaluate
python scripts/ml/phase-a/03-evaluate.py
```
All hyperparameters are documented inline. 3-seed ensembling uses {42, 137, 2026}.
## Citation
```bibtex
@software{sf_ml_baseline_v0_1,
author = {SimpleFunctions},
title = {{sf-ml-baseline}: A feature-based prediction-market forecaster},
year = {2026},
version = {0.1},
publisher = {SimpleFunctions},
url = {https://simplefunctions.dev/opensource/sf-ml-baseline},
license = {CC-BY-4.0 with SimpleFunctions attribution}
}
```
## See also
- `docs/ml/phase-a-investigation.md` β 10 open-question investigation (SPEC-19 Β§13)
- `docs/ml/phase-a-results.md` β gate decision + per-category breakdown
- `.claude/specs/SPEC-19-model-deep-investigation.md` β full 6-phase research plan
|