File size: 3,567 Bytes
ce7016c 0d8000f ce7016c 0d8000f ce7016c 0d8000f ce7016c b93342f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 75ed9fe 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 0d8000f 5a08ee9 94a91a3 0120643 94a91a3 0120643 5a08ee9 0d8000f 5a08ee9 94a91a3 0120643 5a08ee9 0d8000f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | ---
license: mit
library_name: transformers
pipeline_tag: time-series-forecasting
tags:
- time-series
- mixture-of-experts
- forecasting
- pytorch
- fft
model-index:
- name: SuperLinear
results: []
---
# Super-Linear: A Mixture of Experts Time Series Forecasting Model
Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting.
The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105).
## Model Architecture
The Super-Linear model consists of:
- **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts.
- **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing.
- **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns.
## Key Features
- **Adaptive Expert Selection**: Dynamic routing based on input characteristics.
- **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection.
- **Auto-regressive Capabilities**: Supports long-horizon forecasting.
- **Multi-scale Processing**: Handles various sequence lengths through resampling.
## Updates
- On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
## Usage
You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set.
```python
import torch
import numpy as np
from transformers import AutoModelForCausalLM
model_path = "SequentialLearning/SuperLinear"
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
seq_len = 512
pred_len = 96
# Create sample data
freq = 1/24
amp = 1
ph = 0
t = torch.arange(0, seq_len + pred_len)
s = amp * torch.sin(2 * np.pi * freq * t + ph)
x = s[:seq_len].unsqueeze(0) # Add batch dim
with torch.no_grad():
# takes shapes (B, V, L) or (B, L)
output = model(x, pred_len=pred_len, get_prob=True)
preds = output.logits # Predicted values
probs = output.attentions # Expert probabilities stored here
expert_names = model.backbone.experts.keys()
```
## Configuration
Key parameters in `config.json`:
- `train_seq_len`: Training sequence length (default: 512)
- `train_pred_len`: Training prediction length (default: 96)
- `top_k_experts`: Number of experts to use (default: 12)
- `use_fft`: Whether to use FFT-based gating (default: True)
- `freq_experts`: Frequency-specific expert configuration
- `moe_temp`: Temperature for expert selection during inference (default: 1)
## Links
- **GitHub Repository**: [https://github.com/azencot-group/SuperLinear](https://github.com/azencot-group/SuperLinear)
- **Paper**: [https://arxiv.org/abs/2509.15105](https://arxiv.org/abs/2509.15105)
## Citation
If you use Super-Linear in your research, please cite:
```bibtex
@article{nochumsohn2025super,
title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting},
author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri},
journal={arXiv preprint arXiv:2509.15105},
year={2025}
}
```
## License
This model is released under the MIT License. |