braindecode
/

DeepSleepNet

@@ -14,13 +14,12 @@ tags:
 # DeepSleepNet
-DeepSleepNet from Supratak et al (2017) .
-> **Architecture-only repository.** This repo documents the
 > `braindecode.models.DeepSleepNet` class. **No pretrained weights are
-> distributed here** — instantiate the model and train it on your own
-> data, or fine-tune from a published foundation-model checkpoint
-> separately.
 ## Quick start
@@ -39,244 +38,65 @@ model = DeepSleepNet(
 )
 ```
-The signal-shape arguments above are example defaults — adjust them
-to match your recording.
 ## Documentation
-- Full API reference (parameters, references, architecture figure):
-  <https://braindecode.org/stable/generated/braindecode.models.DeepSleepNet.html>
-- Interactive browser with live instantiation:
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/deepsleepnet.py#L12>
-## Architecture description
-The block below is the rendered class docstring (parameters,
-references, architecture figure where available).
-<div class='bd-doc'><main>
-<p>DeepSleepNet from Supratak et al (2017) [Supratak2017]_.</p>
-<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#6c757d;color:white;font-size:11px;font-weight:600;margin-right:4px;">Recurrent</span>
- .. figure:: https://raw.githubusercontent.com/akaraspt/deepsleepnet/master/img/deepsleepnet.png
-     :align: center
-     :alt: DeepSleepNet Architecture
-     :width: 700px
- DeepSleepNet is a deep learning model for automatic sleep stage scoring
- based on raw single-channel EEG. It consists of two main parts:
- 1. **Representation learning** — two CNNs with different filter sizes
-    extract time-invariant features from each 30-s EEG epoch.
- 2. **Sequence residual learning** — bidirectional LSTMs learn temporal
-    information such as stage transition rules, combined with a residual
-    shortcut from the CNN features.
- .. rubric:: Representation Learning
- Two parallel CNN paths process the raw input simultaneously:
- - **Small-filter path** — first conv uses filter length ≈ Fs/2 and
-   stride ≈ Fs/16, capturing *when* characteristic transients occur
-   (temporal precision).
- - **Large-filter path** — first conv uses filter length ≈ 4·Fs and
-   stride ≈ Fs/2, capturing *which* frequency components dominate
-   (frequency precision).
- Each path consists of four convolutional layers (1-D convolution →
- :class:`~torch.nn.BatchNorm2d` → activation, configurable via the
- per-path activation settings) and two :class:`~torch.nn.MaxPool2d`
- layers with :class:`~torch.nn.Dropout` after the first pooling.
- Outputs from both paths are **concatenated** to form the epoch
- embedding.
- .. rubric:: Sequence Residual Learning
- Two layers of bidirectional LSTMs encode temporal dependencies across
- epochs. A **residual shortcut** (fully connected →
- :class:`~torch.nn.BatchNorm1d` → :class:`~torch.nn.ReLU`) projects
- the CNN features to the BiLSTM output dimension and is **added** to
- the BiLSTM output, improving gradient flow and preserving salient
- CNN evidence.
- .. rubric:: Implementation Differences
- .. note::
-    **Peephole connections.** The original implementation uses
-    TensorFlow ``LSTMCell`` with ``use_peepholes=True``, which allows
-    gates to inspect the cell state. :class:`torch.nn.LSTM` does not
-    support peepholes; this implementation uses standard LSTM gates.
-    **Sequence length.** The original model processes **sequences of
-    epochs** through the BiLSTM to capture cross-epoch transition rules.
-    This implementation processes **single epochs** (sequence length 1),
-    so the BiLSTM acts as a nonlinear feature transform with a residual
-    connection. To leverage multi-epoch context, batch consecutive
-    epochs as a sequence externally.
-    **Activation.** The original uses :class:`~torch.nn.ReLU` for both
-    CNN paths. This implementation defaults to :class:`~torch.nn.ELU`
-    for the large-filter path (``activation_large``), which can be
-    overridden.
- .. rubric:: Training (from the paper)
- - **Two-step procedure.** (i) Pre-train the CNN part on a
-   class-balanced training set using oversampling; (ii) fine-tune the
-   whole network with sequential batches using a lower learning rate
-   for the CNNs and a higher one for the sequence residual part.
- - **Dropout** with probability 0.5 is used throughout the model.
- - **L2 weight decay** (λ = 10⁻³) is applied only to the first
-   convolutional layers of both CNN paths.
- - **Gradient clipping** rescales gradients when their global norm
-   exceeds a threshold.
- - **State handling.** BiLSTM states are reinitialized per subject so
-   that temporal context does not leak across recordings.
- Parameters
- ----------
- activation_large : type[nn.Module], default=nn.ELU
-     Activation class for the large-filter CNN path.
- activation_small : type[nn.Module], default=nn.ReLU
-     Activation class for the small-filter CNN path.
- return_feats : bool, default=False
-     If True, return features before the final linear layer.
- drop_prob : float, default=0.5
-     Dropout probability applied throughout the network.
- bilstm_hidden_size : int, default=512
-     Hidden size of the BiLSTM. The residual FC output dimension is
-     ``2 * bilstm_hidden_size`` to match the concatenated directions.
- bilstm_num_layers : int, default=2
-     Number of stacked BiLSTM layers.
- small_n_filters_1 : int, default=64
-     First-conv output channels for the small-filter path.
- small_n_filters_2 : int, default=128
-     Deep-conv (conv2--conv4) output channels for the small-filter path.
- small_first_kernel_size : int, default=50
-     First-conv kernel size for the small path (paper: Fs/2).
- small_first_stride : int, default=6
-     First-conv stride for the small path (paper: Fs/16).
- small_first_padding : int, default=22
-     First-conv padding for the small path.
- small_pool1_kernel_size : int, default=8
-     First max-pool kernel for the small path.
- small_pool1_stride : int, default=8
-     First max-pool stride for the small path.
- small_pool1_padding : int, default=2
-     First max-pool padding for the small path.
- small_deep_kernel_size : int, default=8
-     Deep-conv kernel size for the small path.
- small_pool2_kernel_size : int, default=4
-     Second max-pool kernel for the small path.
- small_pool2_stride : int, default=4
-     Second max-pool stride for the small path.
- small_pool2_padding : int, default=1
-     Second max-pool padding for the small path.
- large_n_filters_1 : int, default=64
-     First-conv output channels for the large-filter path.
- large_n_filters_2 : int, default=128
-     Deep-conv (conv2--conv4) output channels for the large-filter path.
- large_first_kernel_size : int, default=400
-     First-conv kernel size for the large path (paper: 4*Fs).
- large_first_stride : int, default=50
-     First-conv stride for the large path (paper: Fs/2).
- large_first_padding : int, default=175
-     First-conv padding for the large path.
- large_pool1_kernel_size : int, default=4
-     First max-pool kernel for the large path.
- large_pool1_stride : int, default=4
-     First max-pool stride for the large path.
- large_pool1_padding : int, default=0
-     First max-pool padding for the large path.
- large_deep_kernel_size : int, default=6
-     Deep-conv kernel size for the large path.
- large_pool2_kernel_size : int, default=2
-     Second max-pool kernel for the large path.
- large_pool2_stride : int, default=2
-     Second max-pool stride for the large path.
- large_pool2_padding : int, default=1
-     Second max-pool padding for the large path.
- References
- ----------
- .. [Supratak2017] Supratak, A., Dong, H., Wu, C., & Guo, Y. (2017).
-    DeepSleepNet: A model for automatic sleep stage scoring based
-    on raw single-channel EEG. IEEE Transactions on Neural Systems
-    and Rehabilitation Engineering, 25(11), 1998-2008.
- .. rubric:: Hugging Face Hub integration
- When the optional ``huggingface_hub`` package is installed, all models
- automatically gain the ability to be pushed to and loaded from the
- Hugging Face Hub. Install with::
-     pip install braindecode[hub]
- **Pushing a model to the Hub:**
- .. code::
-     from braindecode.models import DeepSleepNet
-     # Train your model
-     model = DeepSleepNet(n_chans=22, n_outputs=4, n_times=1000)
-     # ... training code ...
-     # Push to the Hub
-     model.push_to_hub(
-         repo_id="username/my-deepsleepnet-model",
-         commit_message="Initial model upload",
-     )
- **Loading a model from the Hub:**
- .. code::
-     from braindecode.models import DeepSleepNet
-     # Load pretrained model
-     model = DeepSleepNet.from_pretrained("username/my-deepsleepnet-model")
-     # Load with a different number of outputs (head is rebuilt automatically)
-     model = DeepSleepNet.from_pretrained("username/my-deepsleepnet-model", n_outputs=4)
- **Extracting features and replacing the head:**
- .. code::
-     import torch
-     x = torch.randn(1, model.n_chans, model.n_times)
-     # Extract encoder features (consistent dict across all models)
-     out = model(x, return_features=True)
-     features = out["features"]
-     # Replace the classification head
-     model.reset_head(n_outputs=10)
- **Saving and restoring full configuration:**
- .. code::
-     import json
-     config = model.get_config()            # all __init__ params
-     with open("config.json", "w") as f:
-         json.dump(config, f)
-     model2 = DeepSleepNet.from_config(config)    # reconstruct (no weights)
- All model parameters (both EEG-specific and model-specific such as
- dropout rates, activation functions, number of filters) are automatically
- saved to the Hub and restored when loading.
- See :ref:`load-pretrained-models` for a complete tutorial.</main>
-</div>
 ## Citation
-Please cite both the original paper for this architecture (see the
-*References* section above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,

 # DeepSleepNet
+DeepSleepNet from Supratak et al (2017) [Supratak2017].
+> **Architecture-only repository.** Documents the
 > `braindecode.models.DeepSleepNet` class. **No pretrained weights are
+> distributed here.** Instantiate the model and train it on your own
+> data.
 ## Quick start
 )
 ```
+The signal-shape arguments above are illustrative defaults — adjust to
+match your recording.
 ## Documentation
+- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.DeepSleepNet.html>
+- Interactive browser (live instantiation, parameter counts):
   <https://huggingface.co/spaces/braindecode/model-explorer>
 - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/deepsleepnet.py#L12>
+## Architecture
+![DeepSleepNet architecture](https://raw.githubusercontent.com/akaraspt/deepsleepnet/master/img/deepsleepnet.png)
+## Parameters
+| Parameter | Type | Description |
+|---|---|---|
+| `activation_large` | type[nn.Module], default=nn.ELU | Activation class for the large-filter CNN path. |
+| `activation_small` | type[nn.Module], default=nn.ReLU | Activation class for the small-filter CNN path. |
+| `return_feats` | bool, default=False | If True, return features before the final linear layer. |
+| `drop_prob` | float, default=0.5 | Dropout probability applied throughout the network. |
+| `bilstm_hidden_size` | int, default=512 | Hidden size of the BiLSTM. The residual FC output dimension is `2 * bilstm_hidden_size` to match the concatenated directions. |
+| `bilstm_num_layers` | int, default=2 | Number of stacked BiLSTM layers. |
+| `small_n_filters_1` | int, default=64 | First-conv output channels for the small-filter path. |
+| `small_n_filters_2` | int, default=128 | Deep-conv (conv2--conv4) output channels for the small-filter path. |
+| `small_first_kernel_size` | int, default=50 | First-conv kernel size for the small path (paper: Fs/2). |
+| `small_first_stride` | int, default=6 | First-conv stride for the small path (paper: Fs/16). |
+| `small_first_padding` | int, default=22 | First-conv padding for the small path. |
+| `small_pool1_kernel_size` | int, default=8 | First max-pool kernel for the small path. |
+| `small_pool1_stride` | int, default=8 | First max-pool stride for the small path. |
+| `small_pool1_padding` | int, default=2 | First max-pool padding for the small path. |
+| `small_deep_kernel_size` | int, default=8 | Deep-conv kernel size for the small path. |
+| `small_pool2_kernel_size` | int, default=4 | Second max-pool kernel for the small path. |
+| `small_pool2_stride` | int, default=4 | Second max-pool stride for the small path. |
+| `small_pool2_padding` | int, default=1 | Second max-pool padding for the small path. |
+| `large_n_filters_1` | int, default=64 | First-conv output channels for the large-filter path. |
+| `large_n_filters_2` | int, default=128 | Deep-conv (conv2--conv4) output channels for the large-filter path. |
+| `large_first_kernel_size` | int, default=400 | First-conv kernel size for the large path (paper: 4*Fs). |
+| `large_first_stride` | int, default=50 | First-conv stride for the large path (paper: Fs/2). |
+| `large_first_padding` | int, default=175 | First-conv padding for the large path. |
+| `large_pool1_kernel_size` | int, default=4 | First max-pool kernel for the large path. |
+| `large_pool1_stride` | int, default=4 | First max-pool stride for the large path. |
+| `large_pool1_padding` | int, default=0 | First max-pool padding for the large path. |
+| `large_deep_kernel_size` | int, default=6 | Deep-conv kernel size for the large path. |
+| `large_pool2_kernel_size` | int, default=2 | Second max-pool kernel for the large path. |
+| `large_pool2_stride` | int, default=2 | Second max-pool stride for the large path. |
+| `large_pool2_padding` | int, default=1 | Second max-pool padding for the large path. |
+## References
+1. Supratak, A., Dong, H., Wu, C., & Guo, Y. (2017). DeepSleepNet: A model for automatic sleep stage scoring based on raw single-channel EEG. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(11), 1998-2008.
 ## Citation
+Cite the original architecture paper (see *References* above) and braindecode:
 ```bibtex
 @article{aristimunha2025braindecode,