nielsr HF Staff commited on
Commit
0d8000f
·
verified ·
1 Parent(s): 75ed9fe

Improve model card metadata and content

Browse files

Hi, I'm Niels from the Hugging Face community science team. I'm opening this PR to improve the metadata and documentation for the Super-Linear model.

This PR:
- Adds `library_name: transformers` to the YAML metadata.
- Adds `pipeline_tag: time-series-forecasting` to ensure the model is correctly categorized on the Hub.
- Updates the model card with a link to the official paper and the GitHub repository.
- Provides a sample usage snippet based on the code found in the GitHub README.

Please let me know if you have any questions!

Files changed (1) hide show
  1. README.md +46 -32
README.md CHANGED
@@ -1,64 +1,78 @@
1
-
2
  ---
3
  license: mit
 
 
4
  tags:
5
- - time-series
6
- - mixture-of-experts
7
- - forecasting
8
- - pytorch
9
- - fft
10
  model-index:
11
- - name: SuperLinear
12
- results: []
13
  ---
14
 
15
-
16
  # Super-Linear: A Mixture of Experts Time Series Forecasting Model
17
 
18
- SuperLinear is a novel time series forecasting model that employs a Mixture of Experts (MoE) architecture to achieve superior performance across various forecasting tasks. The model routes inputs to the most relevant experts based on frequency-domain analysis using FFT-based gating networks.
 
 
19
 
20
  ## Model Architecture
21
 
22
- The SuperLinear model consists of:
23
 
24
- - **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts
25
- - **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing
26
- - **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns
27
 
28
  ## Key Features
29
 
30
- - **Adaptive Expert Selection**: Dynamic routing based on input characteristics
31
- - **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection
32
- - **Auto-regressive Capabilities**: Supports long-horizon forecasting
33
- - **Multi-scale Processing**: Handles various sequence lengths through resampling
34
 
35
  ## Updates
36
  - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
37
 
38
  ## Usage
39
 
 
 
40
  ```python
41
- from transformers import AutoModelForCausalLM, AutoConfig
42
  import torch
 
 
43
 
44
- # Load the model
45
- model = AutoModelForCausalLM.from_pretrained("SequentialLearning/SuperLinear", trust_remote_code=True)
46
 
47
- # Prepare input time series data
48
- # Shape: [batch_size, channel, sequence_length] or [batch_size, sequence_length]
49
- input_data = torch.randn(1, 1, 512)
 
 
 
 
 
 
 
 
50
 
51
- # Generate predictions
52
  with torch.no_grad():
53
- outputs = model(inputs_embeds=input_data, pred_len=96, get_prob = True)
54
- preds = outputs.logits # Predicted values
55
- probs = outputs.attentions # Expert probabilities stored here
56
-
 
 
57
  ```
58
 
59
  ## Configuration
60
 
61
- Key parameters:
62
 
63
  - `train_seq_len`: Training sequence length (default: 512)
64
  - `train_pred_len`: Training prediction length (default: 96)
@@ -74,7 +88,7 @@ Key parameters:
74
 
75
  ## Citation
76
 
77
- If you use SuperLinear in your research, please cite:
78
 
79
  ```bibtex
80
  @article{nochumsohn2025super,
@@ -87,4 +101,4 @@ If you use SuperLinear in your research, please cite:
87
 
88
  ## License
89
 
90
- This model is released under the MIT License.
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: time-series-forecasting
5
  tags:
6
+ - time-series
7
+ - mixture-of-experts
8
+ - forecasting
9
+ - pytorch
10
+ - fft
11
  model-index:
12
+ - name: SuperLinear
13
+ results: []
14
  ---
15
 
 
16
  # Super-Linear: A Mixture of Experts Time Series Forecasting Model
17
 
18
+ Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting.
19
+
20
+ The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105).
21
 
22
  ## Model Architecture
23
 
24
+ The Super-Linear model consists of:
25
 
26
+ - **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts.
27
+ - **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing.
28
+ - **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns.
29
 
30
  ## Key Features
31
 
32
+ - **Adaptive Expert Selection**: Dynamic routing based on input characteristics.
33
+ - **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection.
34
+ - **Auto-regressive Capabilities**: Supports long-horizon forecasting.
35
+ - **Multi-scale Processing**: Handles various sequence lengths through resampling.
36
 
37
  ## Updates
38
  - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
39
 
40
  ## Usage
41
 
42
+ You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set.
43
+
44
  ```python
 
45
  import torch
46
+ import numpy as np
47
+ from transformers import AutoModelForCausalLM
48
 
49
+ model_path = "SequentialLearning/SuperLinear"
50
+ model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
51
 
52
+ seq_len = 512
53
+ pred_len = 96
54
+
55
+ # Create sample data
56
+ freq = 1/24
57
+ amp = 1
58
+ ph = 0
59
+
60
+ t = torch.arange(0, seq_len + pred_len)
61
+ s = amp * torch.sin(2 * np.pi * freq * t + ph)
62
+ x = s[:seq_len].unsqueeze(0) # Add batch dim
63
 
 
64
  with torch.no_grad():
65
+ # takes shapes (B, V, L) or (B, L)
66
+ output = model(x, pred_len=pred_len, get_prob=True)
67
+ preds = output.logits # Predicted values
68
+ probs = output.attentions # Expert probabilities stored here
69
+
70
+ expert_names = model.backbone.experts.keys()
71
  ```
72
 
73
  ## Configuration
74
 
75
+ Key parameters in `config.json`:
76
 
77
  - `train_seq_len`: Training sequence length (default: 512)
78
  - `train_pred_len`: Training prediction length (default: 96)
 
88
 
89
  ## Citation
90
 
91
+ If you use Super-Linear in your research, please cite:
92
 
93
  ```bibtex
94
  @article{nochumsohn2025super,
 
101
 
102
  ## License
103
 
104
+ This model is released under the MIT License.