braindecode
/

DeepSleepNet

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - convolutional
+  - sleep-staging
+---
+# DeepSleepNet
+DeepSleepNet from Supratak et al (2017) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.DeepSleepNet` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import DeepSleepNet
+model = DeepSleepNet(
+    n_chans=2,
+    sfreq=100,
+    input_window_seconds=30.0,
+    n_outputs=5,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.DeepSleepNet.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/deepsleepnet.py#L12>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p>DeepSleepNet from Supratak et al (2017) [Supratak2017]_.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#6c757d;color:white;font-size:11px;font-weight:600;margin-right:4px;">Recurrent</span>
+ .. figure:: https://raw.githubusercontent.com/akaraspt/deepsleepnet/master/img/deepsleepnet.png
+     :align: center
+     :alt: DeepSleepNet Architecture
+     :width: 700px
+ DeepSleepNet is a deep learning model for automatic sleep stage scoring
+ based on raw single-channel EEG. It consists of two main parts:
+ 1. **Representation learning** — two CNNs with different filter sizes
+    extract time-invariant features from each 30-s EEG epoch.
+ 2. **Sequence residual learning** — bidirectional LSTMs learn temporal
+    information such as stage transition rules, combined with a residual
+    shortcut from the CNN features.
+ .. rubric:: Representation Learning
+ Two parallel CNN paths process the raw input simultaneously:
+ - **Small-filter path** — first conv uses filter length ≈ Fs/2 and
+   stride ≈ Fs/16, capturing *when* characteristic transients occur
+   (temporal precision).
+ - **Large-filter path** — first conv uses filter length ≈ 4·Fs and
+   stride ≈ Fs/2, capturing *which* frequency components dominate
+   (frequency precision).
+ Each path consists of four convolutional layers (1-D convolution →
+ :class:`~torch.nn.BatchNorm2d` → activation, configurable via the
+ per-path activation settings) and two :class:`~torch.nn.MaxPool2d`
+ layers with :class:`~torch.nn.Dropout` after the first pooling.
+ Outputs from both paths are **concatenated** to form the epoch
+ embedding.
+ .. rubric:: Sequence Residual Learning
+ Two layers of bidirectional LSTMs encode temporal dependencies across
+ epochs. A **residual shortcut** (fully connected →
+ :class:`~torch.nn.BatchNorm1d` → :class:`~torch.nn.ReLU`) projects
+ the CNN features to the BiLSTM output dimension and is **added** to
+ the BiLSTM output, improving gradient flow and preserving salient
+ CNN evidence.
+ .. rubric:: Implementation Differences
+ .. note::
+    **Peephole connections.** The original implementation uses
+    TensorFlow ``LSTMCell`` with ``use_peepholes=True``, which allows
+    gates to inspect the cell state. :class:`torch.nn.LSTM` does not
+    support peepholes; this implementation uses standard LSTM gates.
+    **Sequence length.** The original model processes **sequences of
+    epochs** through the BiLSTM to capture cross-epoch transition rules.
+    This implementation processes **single epochs** (sequence length 1),
+    so the BiLSTM acts as a nonlinear feature transform with a residual
+    connection. To leverage multi-epoch context, batch consecutive
+    epochs as a sequence externally.
+    **Activation.** The original uses :class:`~torch.nn.ReLU` for both
+    CNN paths. This implementation defaults to :class:`~torch.nn.ELU`
+    for the large-filter path (``activation_large``), which can be
+    overridden.
+ .. rubric:: Training (from the paper)
+ - **Two-step procedure.** (i) Pre-train the CNN part on a
+   class-balanced training set using oversampling; (ii) fine-tune the
+   whole network with sequential batches using a lower learning rate
+   for the CNNs and a higher one for the sequence residual part.
+ - **Dropout** with probability 0.5 is used throughout the model.
+ - **L2 weight decay** (λ = 10⁻³) is applied only to the first
+   convolutional layers of both CNN paths.
+ - **Gradient clipping** rescales gradients when their global norm
+   exceeds a threshold.
+ - **State handling.** BiLSTM states are reinitialized per subject so
+   that temporal context does not leak across recordings.
+ Parameters
+ ----------
+ activation_large : type[nn.Module], default=nn.ELU
+     Activation class for the large-filter CNN path.
+ activation_small : type[nn.Module], default=nn.ReLU
+     Activation class for the small-filter CNN path.
+ return_feats : bool, default=False
+     If True, return features before the final linear layer.
+ drop_prob : float, default=0.5
+     Dropout probability applied throughout the network.
+ bilstm_hidden_size : int, default=512
+     Hidden size of the BiLSTM. The residual FC output dimension is
+     ``2 * bilstm_hidden_size`` to match the concatenated directions.
+ bilstm_num_layers : int, default=2
+     Number of stacked BiLSTM layers.
+ small_n_filters_1 : int, default=64
+     First-conv output channels for the small-filter path.
+ small_n_filters_2 : int, default=128
+     Deep-conv (conv2--conv4) output channels for the small-filter path.
+ small_first_kernel_size : int, default=50
+     First-conv kernel size for the small path (paper: Fs/2).
+ small_first_stride : int, default=6
+     First-conv stride for the small path (paper: Fs/16).
+ small_first_padding : int, default=22
+     First-conv padding for the small path.
+ small_pool1_kernel_size : int, default=8
+     First max-pool kernel for the small path.
+ small_pool1_stride : int, default=8
+     First max-pool stride for the small path.
+ small_pool1_padding : int, default=2
+     First max-pool padding for the small path.
+ small_deep_kernel_size : int, default=8
+     Deep-conv kernel size for the small path.
+ small_pool2_kernel_size : int, default=4
+     Second max-pool kernel for the small path.
+ small_pool2_stride : int, default=4
+     Second max-pool stride for the small path.
+ small_pool2_padding : int, default=1
+     Second max-pool padding for the small path.
+ large_n_filters_1 : int, default=64
+     First-conv output channels for the large-filter path.
+ large_n_filters_2 : int, default=128
+     Deep-conv (conv2--conv4) output channels for the large-filter path.
+ large_first_kernel_size : int, default=400
+     First-conv kernel size for the large path (paper: 4*Fs).
+ large_first_stride : int, default=50
+     First-conv stride for the large path (paper: Fs/2).
+ large_first_padding : int, default=175
+     First-conv padding for the large path.
+ large_pool1_kernel_size : int, default=4
+     First max-pool kernel for the large path.
+ large_pool1_stride : int, default=4
+     First max-pool stride for the large path.
+ large_pool1_padding : int, default=0
+     First max-pool padding for the large path.
+ large_deep_kernel_size : int, default=6
+     Deep-conv kernel size for the large path.
+ large_pool2_kernel_size : int, default=2
+     Second max-pool kernel for the large path.
+ large_pool2_stride : int, default=2
+     Second max-pool stride for the large path.
+ large_pool2_padding : int, default=1
+     Second max-pool padding for the large path.
+ References
+ ----------
+ .. [Supratak2017] Supratak, A., Dong, H., Wu, C., & Guo, Y. (2017).
+    DeepSleepNet: A model for automatic sleep stage scoring based
+    on raw single-channel EEG. IEEE Transactions on Neural Systems
+    and Rehabilitation Engineering, 25(11), 1998-2008.
+ .. rubric:: Hugging Face Hub integration
+ When the optional ``huggingface_hub`` package is installed, all models
+ automatically gain the ability to be pushed to and loaded from the
+ Hugging Face Hub. Install with::
+     pip install braindecode[hub]
+ **Pushing a model to the Hub:**
+ .. code::
+     from braindecode.models import DeepSleepNet
+     # Train your model
+     model = DeepSleepNet(n_chans=22, n_outputs=4, n_times=1000)
+     # ... training code ...
+     # Push to the Hub
+     model.push_to_hub(
+         repo_id="username/my-deepsleepnet-model",
+         commit_message="Initial model upload",
+     )
+ **Loading a model from the Hub:**
+ .. code::
+     from braindecode.models import DeepSleepNet
+     # Load pretrained model
+     model = DeepSleepNet.from_pretrained("username/my-deepsleepnet-model")
+     # Load with a different number of outputs (head is rebuilt automatically)
+     model = DeepSleepNet.from_pretrained("username/my-deepsleepnet-model", n_outputs=4)
+ **Extracting features and replacing the head:**
+ .. code::
+     import torch
+     x = torch.randn(1, model.n_chans, model.n_times)
+     # Extract encoder features (consistent dict across all models)
+     out = model(x, return_features=True)
+     features = out["features"]
+     # Replace the classification head
+     model.reset_head(n_outputs=10)
+ **Saving and restoring full configuration:**
+ .. code::
+     import json
+     config = model.get_config()            # all __init__ params
+     with open("config.json", "w") as f:
+         json.dump(config, f)
+     model2 = DeepSleepNet.from_config(config)    # reconstruct (no weights)
+ All model parameters (both EEG-specific and model-specific such as
+ dropout rates, activation functions, number of filters) are automatically
+ saved to the Hub and restored when loading.
+ See :ref:`load-pretrained-models` for a complete tutorial.</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.