File size: 7,083 Bytes
4a2ce27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
license: other
license_name: cc-by-4-0-with-sf-attribution
license_link: LICENSE
language:
- en
library_name: lightgbm
tags:
- prediction-markets
- forecasting
- calibration
- brier
- tabular-classification
- gradient-boosting
- lightgbm
- xgboost
- catboost
- time-series
- kalshi
- polymarket
- simplefunctions
pipeline_tag: tabular-classification
model-index:
- name: sf-ml-baseline-v0.1
  results:
  - task:
      type: tabular-classification
      name: 24h direction forecast (V1 x T1)
    metrics:
    - type: brier
      value: 0.2294
      name: Brier score
    - type: brier_improvement
      value: 0.0206
      name: Improvement vs coinflip baseline (0.2500)
  - task:
      type: tabular-classification
      name: 24h resolution forecast (V2 x T4)
    metrics:
    - type: brier
      value: 0.1681
      name: Brier score (XGBoost)
    - type: brier_improvement
      value: 0.0086
      name: Improvement vs market-price/100 baseline (0.1767)
---

# sf-ml-baseline v0.1

**What it is**: gradient-boosted tree ensembles that predict prediction-market outcomes from engineered microstructure features.

**Published**: 2026-04-19 (initial, time-capsule β€” see "Retrain plan" below).
**Trained on**: 11 days of SimpleFunctions (`market_indicator_history` + `marketwide_resolutions`) β€” 2026-04-08 β†’ 2026-04-19.
**License**: CC-BY-4.0 with SimpleFunctions attribution β€” see `LICENSE`.
**Author**: SimpleFunctions β€” https://simplefunctions.dev
**Model repo**: https://huggingface.co/SimpleFunctions/sf-ml-baseline *(pending upload)*

## Why release this

Nobody has published a calibrated feature-based baseline for prediction-market forecasting. All prior art (Halawi 2024, Schoenegger 2024, AIA 2025) uses LLM + news retrieval. We release this as the **feature-based reference** that LLM systems should ensemble with.

**Brier scores (vs market-implied baseline, 95% CI):**

| Task | Model | Brier | CI | Ξ” vs baseline |
|------|-------|------:|---|---:|
| V1 Γ— T1: direction 24h | LGBM 3-seed | 0.2295 | [0.2290, 0.2299] | **βˆ’0.0205** (vs coinflip 0.2500) |
| V1 Γ— T1: direction 24h | XGBoost 3-seed | 0.2296 | [0.2292, 0.2301] | βˆ’0.0204 |
| V1 Γ— T1: direction 24h | CatBoost 3-seed | 0.2295 | [0.2290, 0.2299] | βˆ’0.0205 |
| **V1 Γ— T1: direction 24h** | **Ensemble (3-model Γ— 3-seed = 9)** | **0.2294** | [0.2289, 0.2299] | **βˆ’0.0206** |
| V2 Γ— T4: resolution 24h | XGBoost 3-seed | 0.1681 | [0.1605, 0.1759] | βˆ’0.0086 (vs price/100 = 0.1767) |

Statistically significant (non-overlapping 95% CI) on V1 Γ— T1 at 246,862 test samples.

## Install

```bash
pip install lightgbm xgboost catboost numpy pandas
```

## Use

```python
from pathlib import Path
from sf_ml_baseline import SFBaseline

model = SFBaseline(weights_dir='sf-ml-baseline/weights')

# Direction forecast: probability that the 24h-forward price will be HIGHER than now.
# Features: current price (cents, 0-100), 24h price delta (cents),
#           implicit yield (%), calibration ratio index (unitless), calibration variability ratio.
p_up = model.predict_direction(price_cents=55, delta_cents=3, iy=12.5, cri=0.6, cvr=0.8)
print(f'P(price rises in next 24h) = {p_up:.3f}')

# Or batch-predict from a DataFrame:
import pandas as pd
df = pd.DataFrame([
    {'price_cents': 55, 'delta_cents': 3, 'iy': 12.5, 'cri': 0.6, 'cvr': 0.8},
    {'price_cents': 82, 'delta_cents': -1, 'iy': 4.5, 'cri': 0.3, 'cvr': 0.9},
])
probas = model.predict_direction_batch(df)
```

See `predict.py` for the full inference code.

## Architecture

- **Features (V1)**: `price_cents`, `delta_cents` (24h price change), `iy` (implicit yield), `cri` (calibration ratio index), `cvr` (calibration variability ratio). Spec: SimpleFunctions [indicator documentation](https://simplefunctions.dev/concepts).
- **Models**: 3 LightGBM + 3 XGBoost + 3 CatBoost, each trained with different seeds. Ensemble predictions by simple mean.
- **Split**: temporal β€” 80% train / 24h embargo / 20% test. 90/10 inner train/val split for early stopping.
- **Target T1**: binary `sign(price(t+24h) - price(t))`, excludes no-move rows (delta==0 at t+24h).
- **Target T4**: `resolved_outcome` ∈ {0, 1} for markets that resolved in 22-26h after the feature capture time.

## Known limitations

1. **Only 11 days of training data.** The full feature history table (`market_indicator_history`) was introduced to SF's data pipeline on 2026-04-08. A proper 30d+ / 180d+ re-train is scheduled; see "Retrain plan" below.
2. **5 base features only.** `market_indicator_history` holds a compact subset of the full indicator stack. The live `market_indicators` table has ~20 features (`iyYes`, `iyNo`, `ee`, `las`, `vr`, `iar`, `rv`, `adjIy`, `cvrDelta`, `overround`, etc.) but only for the current snapshot. Future versions will store history for the full feature set.
3. **`cvr` has 0% feature importance in the direction model.** Investigate whether the window/computation needs tuning.
4. **V2 Γ— T4 (rolling features + resolution label) is below the 0.01 Brier gate globally** β€” works well on Crypto (Ξ”=βˆ’0.041) and Commodities (Ξ”=βˆ’0.036) but not Sports (Ξ”=βˆ’0.006) or Financials (+0.004, model worse).
5. **Do not use for live trading without backtesting against your own execution model.** This is a calibration baseline, not a PnL strategy.

## Retrain plan

**v0.1 is a time-capsule.** Planned retrains:

| Version | Trigger | When | What changes |
|---------|---------|------|------|
| v0.2 | R2 dump archive β‰₯ 30d of indicator history | ~2026-05-20 | Same architecture, more data; per-category specialist models for Crypto/Commodities/Sports |
| v0.3 | Full indicator feature set stored in history (schema change) | TBD | V1 grows from 5 β†’ 20 features; re-run Phase A.3 full grid |
| v1.0 | β‰₯ 6 months of R2 data | ~2026-10 | FT-Transformer + TabPFN + ensemble; formal paper submission (ICLR 2027 FinAI Workshop) |

## Reproduce

```bash
git clone https://github.com/spfunctions/simplefunctions-landing  # (private β€” OSS mirror pending)
cd simplefunctions-landing
source scripts/ml/.venv/bin/activate

# Data pull (uses DIRECT_DATABASE_URL in .env.local)
python scripts/ml/phase-a/01-pull-training-data.py

# Train
python scripts/ml/phase-a/02-train-lgbm.py
python scripts/ml/phase-a/04-bakeoff.py

# Evaluate
python scripts/ml/phase-a/03-evaluate.py
```

All hyperparameters are documented inline. 3-seed ensembling uses {42, 137, 2026}.

## Citation

```bibtex
@software{sf_ml_baseline_v0_1,
  author       = {SimpleFunctions},
  title        = {{sf-ml-baseline}: A feature-based prediction-market forecaster},
  year         = {2026},
  version      = {0.1},
  publisher    = {SimpleFunctions},
  url          = {https://simplefunctions.dev/opensource/sf-ml-baseline},
  license      = {CC-BY-4.0 with SimpleFunctions attribution}
}
```

## See also

- `docs/ml/phase-a-investigation.md` β€” 10 open-question investigation (SPEC-19 Β§13)
- `docs/ml/phase-a-results.md` β€” gate decision + per-category breakdown
- `.claude/specs/SPEC-19-model-deep-investigation.md` β€” full 6-phase research plan