| --- |
| license: mit |
| library_name: transformers |
| pipeline_tag: time-series-forecasting |
| tags: |
| - time-series |
| - mixture-of-experts |
| - forecasting |
| - pytorch |
| - fft |
| model-index: |
| - name: SuperLinear |
| results: [] |
| --- |
| |
| # Super-Linear: A Mixture of Experts Time Series Forecasting Model |
|
|
| Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. |
|
|
| The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105). |
|
|
| ## Model Architecture |
|
|
| The Super-Linear model consists of: |
|
|
| - **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts. |
| - **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing. |
| - **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns. |
|
|
| ## Key Features |
|
|
| - **Adaptive Expert Selection**: Dynamic routing based on input characteristics. |
| - **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection. |
| - **Auto-regressive Capabilities**: Supports long-horizon forecasting. |
| - **Multi-scale Processing**: Handles various sequence lengths through resampling. |
|
|
| ## Updates |
| - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster! |
| |
| ## Usage |
|
|
| You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set. |
|
|
| ```python |
| import torch |
| import numpy as np |
| from transformers import AutoModelForCausalLM |
| |
| model_path = "SequentialLearning/SuperLinear" |
| model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True) |
| |
| seq_len = 512 |
| pred_len = 96 |
| |
| # Create sample data |
| freq = 1/24 |
| amp = 1 |
| ph = 0 |
| |
| t = torch.arange(0, seq_len + pred_len) |
| s = amp * torch.sin(2 * np.pi * freq * t + ph) |
| x = s[:seq_len].unsqueeze(0) # Add batch dim |
| |
| with torch.no_grad(): |
| # takes shapes (B, V, L) or (B, L) |
| output = model(x, pred_len=pred_len, get_prob=True) |
| preds = output.logits # Predicted values |
| probs = output.attentions # Expert probabilities stored here |
| |
| expert_names = model.backbone.experts.keys() |
| ``` |
|
|
| ## Configuration |
|
|
| Key parameters in `config.json`: |
|
|
| - `train_seq_len`: Training sequence length (default: 512) |
| - `train_pred_len`: Training prediction length (default: 96) |
| - `top_k_experts`: Number of experts to use (default: 12) |
| - `use_fft`: Whether to use FFT-based gating (default: True) |
| - `freq_experts`: Frequency-specific expert configuration |
| - `moe_temp`: Temperature for expert selection during inference (default: 1) |
|
|
| ## Links |
|
|
| - **GitHub Repository**: [https://github.com/azencot-group/SuperLinear](https://github.com/azencot-group/SuperLinear) |
| - **Paper**: [https://arxiv.org/abs/2509.15105](https://arxiv.org/abs/2509.15105) |
|
|
| ## Citation |
|
|
| If you use Super-Linear in your research, please cite: |
|
|
| ```bibtex |
| @article{nochumsohn2025super, |
| title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting}, |
| author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri}, |
| journal={arXiv preprint arXiv:2509.15105}, |
| year={2025} |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the MIT License. |