baseline/best.pt β€” Multi-Modal CNN-Transformer (Part 1 baseline)

Headline: 5.24 % test MAPE on the last 2 days of 2022 (8 ISO New England load zones, 24-hour day-ahead horizon).

Model summary

A hybrid CNN-Transformer that fuses HRRR-style weather rasters with per-zone demand history and 44-d calendar features into a unified token sequence, then decodes 24 hourly per-zone demand values for all 8 ISO-NE zones.

Field Value
Architecture CNN-Transformer (joint encoder over unified sequence)
Parameters 1,753,200 (1.75 M)
Spatial token grid 8 Γ— 8 (P = 64 spatial tokens per timestep)
Sequence length (S+24) Β· (P+1) = 48 Β· 65 = 3,120 tokens
Embedding dim D 128
Transformer layers 4 encoder, 4 heads, MLP ratio 4, pre-norm
Total epoch at best 13 (continuous training, no chained resume)
Best val MAPE 6.92 % on val 2021
Test MAPE (2022-12-30/31) 5.24 %
File size 21 MB
SHA256 91069db5bc8f93f832aa0a4e4fb600f075ef382617049225d828003c99ae05c0

Per-zone test MAPE (last 2 days of 2022)

Zone MAPE
ME 2.31 % ⭐
NH 3.69 %
VT 5.95 %
CT 7.28 %
RI 5.27 %
SEMA 5.44 %
WCMA 5.87 %
NEMA_BOST 6.09 %
Overall 5.24 %

Inputs

  • Weather rasters X ∈ ℝ^{(S+24) Γ— 7 Γ— 450 Γ— 449} β€” HRRR-style 7-channel hourly snapshots (S = 24 history hours, 24 future hours)
  • Per-zone demand Y ∈ ℝ^{S Γ— 8} β€” historical MWh demand for the 8 ISO-NE zones
  • Calendar features C ∈ ℝ^{(S+24) Γ— 44} β€” one-hot hour (24) + day-of-week (7) + month (12) + US-holiday flag (1)

Outputs

  • 24-hour day-ahead per-zone demand forecast ΕΆ ∈ ℝ^{24 Γ— 8} in MWh

Loading

import torch
from models.cnn_transformer_baseline import CNNTransformerBaselineForecaster

ckpt = torch.load("pretrained_models/baseline/best.pt",
                  map_location="cpu", weights_only=False)
args = ckpt["args"]
model = CNNTransformerBaselineForecaster(
    n_weather_channels=7, n_zones=8, cal_dim=44,
    history_len=args["history_len"],     # 24
    embed_dim=args["embed_dim"],         # 128
    grid_size=args["grid_size"],         # 8
    n_layers=args["n_layers"],           # 4
    n_heads=args["n_heads"],             # 4
    dropout=args["dropout"],             # 0.1
)
model.load_state_dict(ckpt["model"])
model.eval()
norm_stats = ckpt["norm_stats"]   # {weather_mean, weather_std, energy_mean, energy_std}

Training

  • Optimizer: AdamW, base LR 1e-3, weight decay 1e-4
  • LR schedule: CosineAnnealingLR (T_max = 14 epochs, no chained resume)
  • Loss: MSE in z-score space (per the four-step normalization chain)
  • Validation: MAPE in physical MWh space, per-zone + overall
  • Batch size: 4 per A100 GPU
  • Hardware: A100 40 GB
  • Wall time: ~22 hours
  • Train years: 2019–2020
  • Validation year: 2021
  • Self-eval test slice: 2022-12-30 to 2022-12-31

Limitations

  1. Test numbers are on a 2-window slice; small-sample variance non-negligible.
  2. The CNN trunk is a fixed 5-stage residual stack; spatial-encoder design space not explored.
  3. Random seeds (torch.manual_seed / np.random.seed) are NOT set in the training pipeline β€” headline MAPE is not bit-reproducible across re-training runs. Empirical claims are pinned to this specific checkpoint.

Citation

Liu, Pang. "Multi-Modal Deep Learning for Energy Demand Forecasting"
(CS-137 Assignment 3, Tufts University, Spring 2026).
GitHub: https://github.com/jeffliulab/real-time-power-predict

License

MIT (see top-level LICENSE file in the repo).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support