--- license: mit library_name: transformers pipeline_tag: time-series-forecasting tags: - time-series - mixture-of-experts - forecasting - pytorch - fft model-index: - name: SuperLinear results: [] --- # Super-Linear: A Mixture of Experts Time Series Forecasting Model Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting. The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105). ## Model Architecture The Super-Linear model consists of: - **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts. - **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing. - **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns. ## Key Features - **Adaptive Expert Selection**: Dynamic routing based on input characteristics. - **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection. - **Auto-regressive Capabilities**: Supports long-horizon forecasting. - **Multi-scale Processing**: Handles various sequence lengths through resampling. ## Updates - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster! ## Usage You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set. ```python import torch import numpy as np from transformers import AutoModelForCausalLM model_path = "SequentialLearning/SuperLinear" model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True) seq_len = 512 pred_len = 96 # Create sample data freq = 1/24 amp = 1 ph = 0 t = torch.arange(0, seq_len + pred_len) s = amp * torch.sin(2 * np.pi * freq * t + ph) x = s[:seq_len].unsqueeze(0) # Add batch dim with torch.no_grad(): # takes shapes (B, V, L) or (B, L) output = model(x, pred_len=pred_len, get_prob=True) preds = output.logits # Predicted values probs = output.attentions # Expert probabilities stored here expert_names = model.backbone.experts.keys() ``` ## Configuration Key parameters in `config.json`: - `train_seq_len`: Training sequence length (default: 512) - `train_pred_len`: Training prediction length (default: 96) - `top_k_experts`: Number of experts to use (default: 12) - `use_fft`: Whether to use FFT-based gating (default: True) - `freq_experts`: Frequency-specific expert configuration - `moe_temp`: Temperature for expert selection during inference (default: 1) ## Links - **GitHub Repository**: [https://github.com/azencot-group/SuperLinear](https://github.com/azencot-group/SuperLinear) - **Paper**: [https://arxiv.org/abs/2509.15105](https://arxiv.org/abs/2509.15105) ## Citation If you use Super-Linear in your research, please cite: ```bibtex @article{nochumsohn2025super, title={Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting}, author={Nochumsohn, Liran and Marshanski, Raz and Zisling, Hedi and Azencot, Omri}, journal={arXiv preprint arXiv:2509.15105}, year={2025} } ``` ## License This model is released under the MIT License.