Model performance
Walk-forward out-of-sample backtest: each match is predicted before its result, then ratings update. Binary home-win head-to-head.
Matches scored
5095
Accuracy
69.8%
Brier
0.205857
Baseline Brier
0.25
Brier 0.205857 vs a 0.5 coin-flip baseline of 0.25 — a 17.7% reduction in Brier score. Log loss 0.600864.
Reliability buckets
| P(home) | Count | Avg prediction | Actual rate |
|---|---|---|---|
| 0.2-0.3 | 48 | 26.6% | 0.0% |
| 0.3-0.4 | 278 | 36.2% | 14.0% |
| 0.4-0.5 | 949 | 45.6% | 30.9% |
| 0.5-0.6 | 1850 | 55.2% | 53.2% |
| 0.6-0.7 | 1440 | 64.5% | 79.2% |
| 0.7-0.8 | 492 | 73.7% | 91.9% |
| 0.8-0.9 | 38 | 82.0% | 97.4% |