File size: 3,567 Bytes
ce7016c
 
0d8000f
 
ce7016c
0d8000f
 
 
 
 
ce7016c
0d8000f
 
ce7016c
 
b93342f
5a08ee9
0d8000f
 
 
5a08ee9
 
 
0d8000f
5a08ee9
0d8000f
 
 
5a08ee9
 
 
0d8000f
 
 
 
5a08ee9
75ed9fe
 
 
5a08ee9
 
0d8000f
 
5a08ee9
 
0d8000f
 
5a08ee9
0d8000f
 
5a08ee9
0d8000f
 
 
 
 
 
 
 
 
 
 
5a08ee9
 
0d8000f
 
 
 
 
 
5a08ee9
 
 
 
0d8000f
5a08ee9
 
 
 
 
 
 
 
94a91a3
0120643
94a91a3
 
0120643
5a08ee9
 
0d8000f
5a08ee9
 
94a91a3
 
 
 
0120643
5a08ee9
 
 
 
 
0d8000f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
license: mit
library_name: transformers
pipeline_tag: time-series-forecasting
tags:
- time-series
- mixture-of-experts
- forecasting
- pytorch
- fft
model-index:
- name: SuperLinear
  results: []
---

# Super-Linear: A Mixture of Experts Time Series Forecasting Model

Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting.

The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105).

## Model Architecture

The Super-Linear model consists of:

- **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts.
- **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing.
- **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns.

## Key Features

- **Adaptive Expert Selection**: Dynamic routing based on input characteristics.
- **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection.
- **Auto-regressive Capabilities**: Supports long-horizon forecasting.
- **Multi-scale Processing**: Handles various sequence lengths through resampling.

## Updates
- On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
  
## Usage

You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set.

```python
import torch
import numpy as np
from transformers import AutoModelForCausalLM

model_path = "SequentialLearning/SuperLinear" 
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)

seq_len = 512
pred_len = 96

# Create sample data
freq = 1/24
amp = 1
ph = 0

t = torch.arange(0, seq_len + pred_len)
s = amp * torch.sin(2 * np.pi * freq * t + ph)
x = s[:seq_len].unsqueeze(0)  # Add batch dim

with torch.no_grad():
    # takes shapes (B, V, L) or (B, L)
    output = model(x, pred_len=pred_len, get_prob=True)
    preds = output.logits # Predicted values
    probs = output.attentions  # Expert probabilities stored here

expert_names = model.backbone.experts.keys()
```

## Configuration

Key parameters in `config.json`:

- `train_seq_len`: Training sequence length (default: 512)
- `train_pred_len`: Training prediction length (default: 96)
- `top_k_experts`: Number of experts to use (default: 12)
- `use_fft`: Whether to use FFT-based gating (default: True)
- `freq_experts`: Frequency-specific expert configuration
- `moe_temp`: Temperature for expert selection during inference (default: 1)

## Links

- **GitHub Repository**: [https://github.com/azencot-group/SuperLinear](https://github.com/azencot-group/SuperLinear)
- **Paper**: [https://arxiv.org/abs/2509.15105](https://arxiv.org/abs/2509.15105)

## Citation

If you use Super-Linear in your research, please cite:

```bibtex
@article{nochumsohn2025super,
  title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting},
  author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri},
  journal={arXiv preprint arXiv:2509.15105},
  year={2025}
}
```

## License

This model is released under the MIT License.