braindecode
/

SSTDPN

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - convolutional
+  - transformer
+---
+# SSTDPN
+SSTDPN from Can Han et al (2025) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.SSTDPN` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import SSTDPN
+model = SSTDPN(
+    n_chans=22,
+    sfreq=250,
+    input_window_seconds=4.0,
+    n_outputs=4,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.SSTDPN.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/sstdpn.py#L17>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p>SSTDPN from Can Han et al (2025) [Han2025]_.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span>
+ .. figure:: https://raw.githubusercontent.com/hancan16/SST-DPN/refs/heads/main/figs/framework.png
+     :align: center
+     :alt: SSTDPN Architecture
+     :width: 1000px
+ The **Spatial-Spectral** and **Temporal - Dual Prototype Network** (SST-DPN)
+ is an end-to-end 1D convolutional architecture designed for motor imagery (MI) EEG decoding,
+ aiming to address challenges related to discriminative feature extraction and
+ small-sample sizes [Han2025]_.
+ The framework systematically addresses three key challenges: multi-channel spatial–spectral
+ features and long-term temporal features [Han2025]_.
+ .. rubric:: Architectural Overview
+ SST-DPN consists of a feature extractor (_SSTEncoder, comprising Adaptive Spatial-Spectral
+ Fusion and Multi-scale Variance Pooling) followed by Dual Prototype Learning classification [Han2025]_.
+ 1. **Adaptive Spatial-Spectral Fusion (ASSF)**: Uses :class:`_DepthwiseTemporalConv1d` to generate a
+     multi-channel spatial-spectral representation, followed by :class:`_SpatSpectralAttn`
+     (Spatial-Spectral Attention) to model relationships and highlight key spatial-spectral
+     channels [Han2025]_.
+ 2. **Multi-scale Variance Pooling (MVP)**: Applies :class:`_MultiScaleVarPooler` with variance pooling
+     at multiple temporal scales to capture long-range temporal dependencies, serving as an
+     efficient alternative to transformers [Han2025]_.
+ 3. **Dual Prototype Learning (DPL)**: A training strategy that employs two sets of
+     prototypes—Inter-class Separation Prototypes (proto_sep) and Intra-class Compact
+     Prototypes (proto_cpt)—to optimize the feature space, enhancing generalization ability and
+     preventing overfitting on small datasets [Han2025]_. During inference (forward pass),
+     classification decisions are based on the distance (dot product) between the
+     feature vector and proto_sep for each class [Han2025]_.
+ .. rubric:: Macro Components
+ - `SSTDPN.encoder` **(Feature Extractor)**
+     - *Operations.* Combines Adaptive Spatial-Spectral Fusion and Multi-scale Variance Pooling
+       via an internal :class:`_SSTEncoder`.
+     - *Role.* Maps the raw MI-EEG trial :math:`X_i \in \mathbb{R}^{C \times T}` to the
+       feature space :math:`z_i \in \mathbb{R}^d`.
+ - `_SSTEncoder.temporal_conv` **(Depthwise Temporal Convolution for Spectral Extraction)**
+     - *Operations.* Internal :class:`_DepthwiseTemporalConv1d` applying separate temporal
+       convolution filters to each channel with kernel size `temporal_conv_kernel_size` and
+       depth multiplier `n_spectral_filters_temporal` (equivalent to :math:`F_1` in the paper).
+     - *Role.* Extracts multiple distinct spectral bands from each EEG channel independently.
+ - `_SSTEncoder.spt_attn` **(Spatial-Spectral Attention for Channel Gating)**
+     - *Operations.* Internal :class:`_SpatSpectralAttn` module using Global Context Embedding
+       via variance-based pooling, followed by adaptive channel normalization and gating.
+     - *Role.* Reweights channels in the spatial-spectral dimension to extract efficient and
+       discriminative features by emphasizing task-relevant regions and frequency bands.
+ - `_SSTEncoder.chan_conv` **(Pointwise Fusion across Channels)**
+     - *Operations.* A 1D pointwise convolution with `n_fused_filters` output channels
+       (equivalent to :math:`F_2` in the paper), followed by BatchNorm and the specified
+       `activation` function (default: ELU).
+     - *Role.* Fuses the weighted spatial-spectral features across all electrodes to produce
+       a fused representation :math:`X_{fused} \in \mathbb{R}^{F_2 \times T}`.
+ - `_SSTEncoder.mvp` **(Multi-scale Variance Pooling for Temporal Extraction)**
+     - *Operations.* Internal :class:`_MultiScaleVarPooler` using :class:`_VariancePool1D`
+       layers at multiple scales (`mvp_kernel_sizes`), followed by concatenation.
+     - *Role.* Captures long-range temporal features at multiple time scales. The variance
+       operation leverages the prior that variance represents EEG spectral power.
+ - `SSTDPN.proto_sep` / `SSTDPN.proto_cpt` **(Dual Prototypes)**
+     - *Operations.* Learnable vectors optimized during training using prototype learning losses.
+       The `proto_sep` (Inter-class Separation Prototype) is constrained via L2 weight-normalization
+       (:math:`\lVert s_i \rVert_2 \leq` `proto_sep_maxnorm`) during inference.
+     - *Role.* `proto_sep` achieves inter-class separation; `proto_cpt` enhances intra-class compactness.
+ .. rubric:: How the information is encoded temporally, spatially, and spectrally
+ * **Temporal.**
+    The initial :class:`_DepthwiseTemporalConv1d` uses a large kernel (e.g., 75). The MVP module employs pooling
+    kernels that are much larger (e.g., 50, 100, 200 samples) to capture long-term temporal
+    features effectively. Large kernel pooling layers are shown to be superior to transformer
+    modules for this task in EEG decoding according to [Han2025]_.
+ * **Spatial.**
+    The initial convolution at the classes :class:`_DepthwiseTemporalConv1d` groups parameter :math:`h=1`,
+    meaning :math:`F_1` temporal filters are shared across channels. The Spatial-Spectral Attention
+    mechanism explicitly models the relationships among these channels in the spatial-spectral
+    dimension, allowing for finer-grained spatial feature modeling compared to conventional
+    GCNs according to the authors [Han2025]_.
+    In other words, all electrode channels share :math:`F_1` temporal filters
+    independently to produce the spatial-spectral representation.
+ * **Spectral.**
+    Spectral information is implicitly extracted via the :math:`F_1` filters in :class:`_DepthwiseTemporalConv1d`.
+    Furthermore, the use of Variance Pooling (in MVP) explicitly leverages the neurophysiological
+    prior that the **variance of EEG signals represents their spectral power**, which is an
+    important feature for distinguishing different MI classes [Han2025]_.
+ .. rubric:: Additional Mechanisms
+ - **Attention.** A lightweight Spatial-Spectral Attention mechanism models spatial-spectral relationships
+     at the channel level, distinct from applying attention to deep feature dimensions,
+     which is common in comparison methods like :class:`ATCNet`.
+ - **Regularization.** Dual Prototype Learning acts as a regularization technique
+     by optimizing the feature space to be compact within classes and separated between
+     classes. This enhances model generalization and classification performance, particularly
+     useful for limited data typical of MI-EEG tasks, without requiring external transfer
+     learning data, according to [Han2025]_.
+ Notes
+ -----
+ * The implementation of the DPL loss functions (:math:`\mathcal{L}_S`, :math:`\mathcal{L}_C`, :math:`\mathcal{L}_{EF}`)
+   and the optimization of ICPs are typically handled outside the primary ``forward`` method, within the training strategy
+   (see Ref. 52 in [Han2025]_).
+ * The default parameters are configured based on the BCI Competition IV 2a dataset.
+ * The use of Prototype Learning (PL) methods is novel in the field of EEG-MI decoding.
+ * **Lowest FLOPs:** Achieves the lowest Floating Point Operations (FLOPs) (9.65 M) among competitive
+   SOTA methods, including braindecode models like :class:`ATCNet` (29.81 M) and
+   :class:`EEGConformer` (63.86 M), demonstrating computational efficiency [Han2025]_.
+ * **Transformer Alternative:** Multi-scale Variance Pooling (MVP) provides a accuracy
+   improvement over temporal attention transformer modules in ablation studies, offering a more
+   efficient alternative to transformer-based approaches like :class:`EEGConformer` [Han2025]_.
+ .. warning::
+     **Important:** To utilize the full potential of SSTDPN with Dual Prototype Learning (DPL),
+     users must implement the DPL optimization strategy outside the model's forward method.
+     For implementation details and training strategies, please consult the official code at
+     [Han2025Code]_:
+     https://github.com/hancan16/SST-DPN/blob/main/train.py
+ Parameters
+ ----------
+ n_spectral_filters_temporal : int, optional
+     Number of spectral filters extracted per channel via temporal convolution.
+     These represent the temporal spectral bands (equivalent to :math:`F_1` in the paper).
+     Default is 9.
+ n_fused_filters : int, optional
+     Number of output filters after pointwise fusion convolution.
+     These fuse the spectral filters across all channels (equivalent to :math:`F_2` in the paper).
+     Default is 48.
+ temporal_conv_kernel_size : int, optional
+     Kernel size for the temporal convolution layer. Controls the receptive field for extracting
+     spectral information. Default is 75 samples.
+ mvp_kernel_sizes : list[int], optional
+     Kernel sizes for Multi-scale Variance Pooling (MVP) module.
+     Larger kernels capture long-term temporal dependencies .
+ return_features : bool, optional
+     If True, the forward pass returns (features, logits). If False, returns only logits.
+     Default is False.
+ proto_sep_maxnorm : float, optional
+     Maximum L2 norm constraint for Inter-class Separation Prototypes during forward pass.
+     This constraint acts as an implicit force to push features away from the origin. Default is 1.0.
+ proto_cpt_std : float, optional
+     Standard deviation for Intra-class Compactness Prototype initialization. Default is 0.01.
+ spt_attn_global_context_kernel : int, optional
+     Kernel size for global context embedding in Spatial-Spectral Attention module.
+     Default is 250 samples.
+ spt_attn_epsilon : float, optional
+     Small epsilon value for numerical stability in Spatial-Spectral Attention. Default is 1e-5.
+ spt_attn_mode : str, optional
+     Embedding computation mode for Spatial-Spectral Attention ('var', 'l2', or 'l1').
+     Default is 'var' (variance-based mean-var operation).
+ activation : nn.Module, optional
+     Activation function to apply after the pointwise fusion convolution in :class:`_SSTEncoder`.
+     Should be a PyTorch activation module class. Default is nn.ELU.
+ References
+ ----------
+ .. [Han2025] Han, C., Liu, C., Wang, J., Wang, Y., Cai, C.,
+     & Qian, D. (2025). A spatial–spectral and temporal dual
+     prototype network for motor imagery brain–computer
+     interface. Knowledge-Based Systems, 315, 113315.
+ .. [Han2025Code] Han, C., Liu, C., Wang, J., Wang, Y.,
+     Cai, C., & Qian, D. (2025). A spatial–spectral and
+     temporal dual prototype network for motor imagery
+     brain–computer interface. Knowledge-Based Systems,
+     315, 113315. GitHub repository.
+     https://github.com/hancan16/SST-DPN.
+ .. rubric:: Hugging Face Hub integration
+ When the optional ``huggingface_hub`` package is installed, all models
+ automatically gain the ability to be pushed to and loaded from the
+ Hugging Face Hub. Install with::
+     pip install braindecode[hub]
+ **Pushing a model to the Hub:**
+ .. code::
+     from braindecode.models import SSTDPN
+     # Train your model
+     model = SSTDPN(n_chans=22, n_outputs=4, n_times=1000)
+     # ... training code ...
+     # Push to the Hub
+     model.push_to_hub(
+         repo_id="username/my-sstdpn-model",
+         commit_message="Initial model upload",
+     )
+ **Loading a model from the Hub:**
+ .. code::
+     from braindecode.models import SSTDPN
+     # Load pretrained model
+     model = SSTDPN.from_pretrained("username/my-sstdpn-model")
+     # Load with a different number of outputs (head is rebuilt automatically)
+     model = SSTDPN.from_pretrained("username/my-sstdpn-model", n_outputs=4)
+ **Extracting features and replacing the head:**
+ .. code::
+     import torch
+     x = torch.randn(1, model.n_chans, model.n_times)
+     # Extract encoder features (consistent dict across all models)
+     out = model(x, return_features=True)
+     features = out["features"]
+     # Replace the classification head
+     model.reset_head(n_outputs=10)
+ **Saving and restoring full configuration:**
+ .. code::
+     import json
+     config = model.get_config()            # all __init__ params
+     with open("config.json", "w") as f:
+         json.dump(config, f)
+     model2 = SSTDPN.from_config(config)    # reconstruct (no weights)
+ All model parameters (both EEG-specific and model-specific such as
+ dropout rates, activation functions, number of filters) are automatically
+ saved to the Hub and restored when loading.
+ See :ref:`load-pretrained-models` for a complete tutorial.</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.