Improve model card metadata and content

Hi, I'm Niels from the Hugging Face community science team. I'm opening this PR to improve the metadata and documentation for the Super-Linear model.

This PR:
- Adds `library_name: transformers` to the YAML metadata.
- Adds `pipeline_tag: time-series-forecasting` to ensure the model is correctly categorized on the Hub.
- Updates the model card with a link to the official paper and the GitHub repository.
- Provides a sample usage snippet based on the code found in the GitHub README.

Please let me know if you have any questions!

Files changed (1) hide show

README.md +46 -32

README.md CHANGED Viewed

@@ -1,64 +1,78 @@
 ---
 license: mit
 tags:
-  - time-series
-  - mixture-of-experts
-  - forecasting
-  - pytorch
-  - fft
 model-index:
-  - name: SuperLinear
-    results: []
 ---
 # Super-Linear: A Mixture of Experts Time Series Forecasting Model
-SuperLinear is a novel time series forecasting model that employs a Mixture of Experts (MoE) architecture to achieve superior performance across various forecasting tasks. The model routes inputs to the most relevant experts based on frequency-domain analysis using FFT-based gating networks.
 ## Model Architecture
-The SuperLinear model consists of:
-- **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts
-- **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing
-- **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns
 ## Key Features
-- **Adaptive Expert Selection**: Dynamic routing based on input characteristics
-- **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection
-- **Auto-regressive Capabilities**: Supports long-horizon forecasting
-- **Multi-scale Processing**: Handles various sequence lengths through resampling
 ## Updates
 - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
 ## Usage
 ```python
-from transformers import AutoModelForCausalLM, AutoConfig
 import torch
-# Load the model
-model = AutoModelForCausalLM.from_pretrained("SequentialLearning/SuperLinear", trust_remote_code=True)
-# Prepare input time series data
-# Shape: [batch_size, channel, sequence_length] or [batch_size, sequence_length]
-input_data = torch.randn(1, 1, 512)
-# Generate predictions
 with torch.no_grad():
-    outputs = model(inputs_embeds=input_data, pred_len=96, get_prob = True)
-    preds = outputs.logits # Predicted values
-    probs = outputs.attentions  # Expert probabilities stored here
 ```
 ## Configuration
-Key parameters:
 - `train_seq_len`: Training sequence length (default: 512)
 - `train_pred_len`: Training prediction length (default: 96)
@@ -74,7 +88,7 @@ Key parameters:
 ## Citation
-If you use SuperLinear in your research, please cite:
 ```bibtex
 @article{nochumsohn2025super,
@@ -87,4 +101,4 @@ If you use SuperLinear in your research, please cite:
 ## License
-This model is released under the MIT License.

 ---
 license: mit
+library_name: transformers
+pipeline_tag: time-series-forecasting
 tags:
+- time-series
+- mixture-of-experts
+- forecasting
+- pytorch
+- fft
 model-index:
+- name: SuperLinear
+  results: []
 ---
 # Super-Linear: A Mixture of Experts Time Series Forecasting Model
+Super-Linear is a lightweight and scalable mixture-of-experts (MoE) model for general time series forecasting. It replaces deep architectures with simple frequency-specialized linear experts, trained on resampled data across multiple frequency regimes. A lightweight spectral gating mechanism dynamically selects relevant experts, enabling efficient, accurate forecasting.
+The model was introduced in the paper [Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting](https://huggingface.co/papers/2509.15105).
 ## Model Architecture
+The Super-Linear model consists of:
+- **Sparse Mixture of Experts (MoE)**: Routes inputs to the top-k most relevant experts.
+- **FFT-based Gating Network**: Uses frequency domain analysis to determine expert routing.
+- **Frequency-specific Experts**: Pre-trained experts specialized for different temporal patterns.
 ## Key Features
+- **Adaptive Expert Selection**: Dynamic routing based on input characteristics.
+- **Frequency-aware Processing**: Leverages FFT analysis for intelligent expert selection.
+- **Auto-regressive Capabilities**: Supports long-horizon forecasting.
+- **Multi-scale Processing**: Handles various sequence lengths through resampling.
 ## Updates
 - On 26/01/2026 a slight implementation modification was introduced, now inference is more than 10x faster! Making Super-Linear the fastest pre-trained forecaster!
 ## Usage
+You can use the model via the `transformers` library. Ensure you have `trust_remote_code=True` set.
 ```python
 import torch
+import numpy as np
+from transformers import AutoModelForCausalLM
+model_path = "SequentialLearning/SuperLinear"
+model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
+seq_len = 512
+pred_len = 96
+# Create sample data
+freq = 1/24
+amp = 1
+ph = 0
+t = torch.arange(0, seq_len + pred_len)
+s = amp * torch.sin(2 * np.pi * freq * t + ph)
+x = s[:seq_len].unsqueeze(0)  # Add batch dim
 with torch.no_grad():
+    # takes shapes (B, V, L) or (B, L)
+    output = model(x, pred_len=pred_len, get_prob=True)
+    preds = output.logits # Predicted values
+    probs = output.attentions  # Expert probabilities stored here
+expert_names = model.backbone.experts.keys()
 ```
 ## Configuration
+Key parameters in `config.json`:
 - `train_seq_len`: Training sequence length (default: 512)
 - `train_pred_len`: Training prediction length (default: 96)
 ## Citation
+If you use Super-Linear in your research, please cite:
 ```bibtex
 @article{nochumsohn2025super,
 ## License
+This model is released under the MIT License.