braindecode
/

ATCNet

+---
+license: bsd-3-clause
+library_name: braindecode
+pipeline_tag: feature-extraction
+tags:
+  - eeg
+  - biosignal
+  - pytorch
+  - neuroscience
+  - braindecode
+  - convolutional
+  - transformer
+---
+# ATCNet
+ATCNet from Altaheri et al   (2022) .
+> **Architecture-only repository.** This repo documents the
+> `braindecode.models.ATCNet` class. **No pretrained weights are
+> distributed here** — instantiate the model and train it on your own
+> data, or fine-tune from a published foundation-model checkpoint
+> separately.
+## Quick start
+```bash
+pip install braindecode
+```
+```python
+from braindecode.models import ATCNet
+model = ATCNet(
+    n_chans=22,
+    sfreq=250,
+    input_window_seconds=4.0,
+    n_outputs=4,
+)
+```
+The signal-shape arguments above are example defaults — adjust them
+to match your recording.
+## Documentation
+- Full API reference (parameters, references, architecture figure):
+  <https://braindecode.org/stable/generated/braindecode.models.ATCNet.html>
+- Interactive browser with live instantiation:
+  <https://huggingface.co/spaces/braindecode/model-explorer>
+- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/atcnet.py#L15>
+## Architecture description
+The block below is the rendered class docstring (parameters,
+references, architecture figure where available).
+<div class='bd-doc'><main>
+<p>ATCNet from Altaheri et al   (2022) [1]_.</p>
+<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#5cb85c;color:white;font-size:11px;font-weight:600;margin-right:4px;">Convolution</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#6c757d;color:white;font-size:11px;font-weight:600;margin-right:4px;">Recurrent</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span>
+ .. figure:: https://user-images.githubusercontent.com/25565236/185449791-e8539453-d4fa-41e1-865a-2cf7e91f60ef.png
+     :align: center
+     :alt: ATCNet Architecture
+     :width: 650px
+ .. rubric:: Architectural Overview
+ ATCNet is a *convolution-first* architecture augmented with a *lightweight attention–TCN*
+ sequence module. The end-to-end flow is:
+ - (i) :class:`_ConvBlock` learns temporal filter-banks and spatial projections (EEGNet-style),
+   downsampling time to a compact feature map;
+ - (ii) Sliding Windows carve overlapping temporal windows from this map;
+ - (iii) for each window, :class:`_AttentionBlock` applies small multi-head self-attention
+   over time, followed by a :class:`_TCNResidualBlock` stack (causal, dilated);
+ - (iv) window-level features are aggregated (mean of window logits or concatenation)
+   and mapped via a max-norm–constrained linear layer.
+ Relative to ViT, ATCNet replaces linear patch projection with learned *temporal–spatial*
+ convolutions; it processes *parallel* window encoders (attention→TCN) instead of a deep
+ stack; and swaps the MLP head for a TCN suited to 1-D EEG sequences.
+ .. rubric:: Macro Components
+ - :class:`_ConvBlock` **(Shallow conv stem → feature map)**
+     - *Operations.*
+     - **Temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_t, 1)`` builds a
+         FIR-like filter bank (``F1`` maps).
+     - **Depthwise spatial conv** (:class:`torch.nn.Conv2d`, ``groups=F1``) with kernel
+       ``(1, n_chans)`` learns per-filter spatial projections (akin to EEGNet's CSP-like step).
+     - **BN → ELU → AvgPool → Dropout** to stabilize and condense activations.
+     - **Refining temporal conv** (:class:`torch.nn.Conv2d`) with kernel ``(L_r, 1)`` +
+       **BN → ELU → AvgPool → Dropout**.
+ The output shape is ``(B, F2, T_c, 1)`` with ``F2 = F1·D`` and ``T_c = T/(P1·P2)``.
+ Temporal kernels behave as FIR filters; the depthwise-spatial conv yields frequency-specific
+ topographies. Pooling acts as a local integrator, reducing variance and imposing a
+ useful inductive bias on short EEG windows.
+ - **Sliding-Window Sequencer**
+     From the condensed time axis (length ``T_c``), ATCNet forms ``n`` overlapping windows
+     of width ``T_w = T_c - n + 1`` (one start per index). Each window produces a sequence
+     ``(B, F2, T_w)`` forwarded to its own attention-TCN branch. This creates *parallel*
+     encoders over shifted contexts and is key to robustness on nonstationary EEG.
+ - :class:`_AttentionBlock` **(small MHA on temporal positions)**
+     Attention here is *local to a window* and purely temporal.
+     - *Operations.*
+     - Rearrange to ``(B, T_w, F2)``,
+     - Normalization :class:`torch.nn.LayerNorm`
+     - Custom MultiHeadAttention :class:`_MHA` (``num_heads=H``, per-head dim ``d_h``) + residual add,
+     - Dropout :class:`torch.nn.Dropout`
+     - Rearrange back to ``(B, F2, T_w)``.
+     *Role.* Re-weights evidence across the window, letting the model emphasize informative
+     segments (onsets, bursts) before causal convolutions aggregate history.
+ - :class:`_TCNResidualBlock` **(causal dilated temporal CNN)**
+     *Operations:*
+     - Two :class:`braindecode.modules.CausalConv1d` layers per block with dilation  ``1, 2, 4, …``
+     - Across blocks of `torch.nn.ELU` + `torch.nn.BatchNorm1d` + `torch.nn.Dropout`) +
+       a residual (identity or 1x1 mapping).
+     - The final feature used per window is the *last* causal step ``[..., -1]`` (forecast-style).
+     *Role.* Efficient long-range temporal integration with stable gradients; the dilated
+     receptive field complements attention's soft selection.
+ - **Aggregation & Classifier**
+     *Operations:*
+     - Either (a) map each window feature ``(B, F2)`` to logits via :class:`braindecode.modules.MaxNormLinear`
+       and **average** across windows (default, matching official code), or
+     - (b) **concatenate** all window features ``(B, n·F2)`` and apply a single :class:`MaxNormLinear`.
+     The max-norm constraint regularizes the readout.
+ .. rubric:: Convolutional Details
+ - **Temporal.** Temporal structure is learned in three places:
+     - (1) the stem's wide ``(L_t, 1)`` conv (learned filter bank),
+     - (2) the refining ``(L_r, 1)`` conv after pooling (short-term dynamics), and
+     - (3) the TCN's causal 1-D convolutions with exponentially increasing dilation
+       (long-range dependencies). The minimum sequence length required by the TCN stack is
+       ``(K_t - 1)·2^{L-1} + 1``; the implementation *auto-scales* kernels/pools/windows
+       when inputs are shorter to preserve feasibility.
+ - **Spatial.** A depthwise spatial conv spans the **full montage** (kernel ``(1, n_chans)``),
+     producing *per-temporal-filter* spatial projections (no cross-filter mixing at this step).
+     This mirrors EEGNet's interpretability: each temporal filter has its own spatial pattern.
+ .. rubric:: Attention / Sequential Modules
+ - **Type.** Multi-head self-attention with ``H`` heads and per-head dim ``d_h`` implemented
+   in :class:`_MHA`, allowing ``embed_dim = H·d_h`` independent of input and output dims.
+ - **Shapes.** ``(B, F2, T_w) → (B, T_w, F2) → (B, F2, T_w)``. Attention operates along
+   the **temporal** axis within a window; channels/features stay in the embedding dim ``F2``.
+ - **Role.** Highlights salient temporal positions prior to causal convolution; small attention
+   keeps compute modest while improving context modeling over pooled features.
+ .. rubric:: Additional Mechanisms
+ - **Parallel encoders over shifted windows.** Improves montage/phase robustness by
+   ensembling nearby contexts rather than committing to a single segmentation.
+ - **Max-norm classifier.** Enforces weight norm constraints at the readout, a common
+   stabilization trick in EEG decoding.
+ - **ViT vs. ATCNet (design choices).** Convolutional *nonlinear* projection rather than
+   linear patchification; attention followed by **TCN** (not MLP); *parallel* window
+   encoders rather than stacked encoders.
+ .. rubric:: Usage and Configuration
+ - ``conv_block_n_filters (F1)``, ``conv_block_depth_mult (D)`` → capacity of the stem
+   (with ``F2 = F1·D`` feeding attention/TCN), dimensions aligned to ``F2``, like :class:`EEGNet`.
+ - Pool sizes ``P1,P2`` trade temporal resolution for stability/compute; they set
+   ``T_c = T/(P1·P2)`` and thus window width ``T_w``.
+ - ``n_windows`` controls the ensemble over shifts (compute ∝ windows).
+ - ``num_heads``, ``head_dim`` set attention capacity; keep ``H·d_h ≈ F2``.
+ - ``tcn_depth``, ``tcn_kernel_size`` govern receptive field; larger values demand
+   longer inputs (see minimum length above). The implementation warns and *rescales*
+   kernels/pools/windows if inputs are too short.
+ - **Aggregation choice.** ``concat=False`` (default, average of per-window logits) matches
+   the official code; ``concat=True`` mirrors the paper's concatenation variant.
+ Parameters
+ ----------
+ input_window_seconds : float, optional
+     Time length of inputs, in seconds. Defaults to 4.5 s, as in BCI-IV 2a
+     dataset.
+ sfreq : int, optional
+     Sampling frequency of the inputs, in Hz. Default to 250 Hz, as in
+     BCI-IV 2a dataset.
+ conv_block_n_filters : int
+     Number temporal filters in the first convolutional layer of the
+     convolutional block, denoted F1 in figure 2 of the paper [1]_. Defaults
+     to 16 as in [1]_.
+ conv_block_kernel_length_1 : int
+     Length of temporal filters in the first convolutional layer of the
+     convolutional block, denoted Kc in table 1 of the paper [1]_. Defaults
+     to 64 as in [1]_.
+ conv_block_kernel_length_2 : int
+     Length of temporal filters in the last convolutional layer of the
+     convolutional block. Defaults to 16 as in [1]_.
+ conv_block_pool_size_1 : int
+     Length of first average pooling kernel in the convolutional block.
+     Defaults to 8 as in [1]_.
+ conv_block_pool_size_2 : int
+     Length of first average pooling kernel in the convolutional block,
+     denoted P2 in table 1 of the paper [1]_. Defaults to 7 as in [1]_.
+ conv_block_depth_mult : int
+     Depth multiplier of depthwise convolution in the convolutional block,
+     denoted D in table 1 of the paper [1]_. Defaults to 2 as in [1]_.
+ conv_block_dropout : float
+     Dropout probability used in the convolution block, denoted pc in
+     table 1 of the paper [1]_. Defaults to 0.3 as in [1]_.
+ n_windows : int
+     Number of sliding windows, denoted n in [1]_. Defaults to 5 as in [1]_.
+ head_dim : int
+     Embedding dimension used in each self-attention head, denoted dh in
+     table 1 of the paper [1]_. Defaults to 8 as in [1]_.
+ num_heads : int
+     Number of attention heads, denoted H in table 1 of the paper [1]_.
+     Defaults to 2 as in [1]_.
+ att_dropout : float
+     Dropout probability used in the attention block, denoted pa in table 1
+     of the paper [1]_. Defaults to 0.5 as in [1]_.
+ tcn_depth : int
+     Depth of Temporal Convolutional Network block (i.e. number of TCN
+     Residual blocks), denoted L in table 1 of the paper [1]_. Defaults to 2
+     as in [1]_.
+ tcn_kernel_size : int
+     Temporal kernel size used in TCN block, denoted Kt in table 1 of the
+     paper [1]_. Defaults to 4 as in [1]_.
+ tcn_dropout : float
+     Dropout probability used in the TCN block, denoted pt in table 1
+     of the paper [1]_. Defaults to 0.3 as in [1]_.
+ tcn_activation : torch.nn.Module
+     Nonlinear activation to use. Defaults to nn.ELU().
+ concat : bool
+     When ``True``, concatenates each slidding window embedding before
+     feeding it to a fully-connected layer, as done in [1]_. When ``False``,
+     maps each slidding window to `n_outputs` logits and average them.
+     Defaults to ``False`` contrary to what is reported in [1]_, but
+     matching what the official code does [2]_.
+ max_norm_const : float
+     Maximum L2-norm constraint imposed on weights of the last
+     fully-connected layer. Defaults to 0.25.
+ Notes
+ -----
+ - Inputs substantially shorter than the implied minimum length trigger **automatic
+   downscaling** of kernels, pools, windows, and TCN kernel size to maintain validity.
+ - The attention–TCN sequence operates **per window**; the last causal step is used as the
+   window feature, aligning the temporal semantics across windows.
+ .. versionadded:: 1.1
+     - More detailed documentation of the model.
+ References
+ ----------
+ .. [1] H. Altaheri, G. Muhammad, M. Alsulaiman (2022).
+     *Physics-informed attention temporal convolutional network for EEG-based motor imagery classification.*
+     IEEE Transactions on Industrial Informatics. doi:10.1109/TII.2022.3197419.
+ .. [2] Official EEG-ATCNet implementation (TensorFlow):
+     https://github.com/Altaheri/EEG-ATCNet/blob/main/models.py
+ .. rubric:: Hugging Face Hub integration
+ When the optional ``huggingface_hub`` package is installed, all models
+ automatically gain the ability to be pushed to and loaded from the
+ Hugging Face Hub. Install with::
+     pip install braindecode[hub]
+ **Pushing a model to the Hub:**
+ .. code::
+     from braindecode.models import ATCNet
+     # Train your model
+     model = ATCNet(n_chans=22, n_outputs=4, n_times=1000)
+     # ... training code ...
+     # Push to the Hub
+     model.push_to_hub(
+         repo_id="username/my-atcnet-model",
+         commit_message="Initial model upload",
+     )
+ **Loading a model from the Hub:**
+ .. code::
+     from braindecode.models import ATCNet
+     # Load pretrained model
+     model = ATCNet.from_pretrained("username/my-atcnet-model")
+     # Load with a different number of outputs (head is rebuilt automatically)
+     model = ATCNet.from_pretrained("username/my-atcnet-model", n_outputs=4)
+ **Extracting features and replacing the head:**
+ .. code::
+     import torch
+     x = torch.randn(1, model.n_chans, model.n_times)
+     # Extract encoder features (consistent dict across all models)
+     out = model(x, return_features=True)
+     features = out["features"]
+     # Replace the classification head
+     model.reset_head(n_outputs=10)
+ **Saving and restoring full configuration:**
+ .. code::
+     import json
+     config = model.get_config()            # all __init__ params
+     with open("config.json", "w") as f:
+         json.dump(config, f)
+     model2 = ATCNet.from_config(config)    # reconstruct (no weights)
+ All model parameters (both EEG-specific and model-specific such as
+ dropout rates, activation functions, number of filters) are automatically
+ saved to the Hub and restored when loading.
+ See :ref:`load-pretrained-models` for a complete tutorial.</main>
+</div>
+## Citation
+Please cite both the original paper for this architecture (see the
+*References* section above) and braindecode:
+```bibtex
+@article{aristimunha2025braindecode,
+  title   = {Braindecode: a deep learning library for raw electrophysiological data},
+  author  = {Aristimunha, Bruno and others},
+  journal = {Zenodo},
+  year    = {2025},
+  doi     = {10.5281/zenodo.17699192},
+}
+```
+## License
+BSD-3-Clause for the model code (matching braindecode).
+Pretraining-derived weights, if you fine-tune from a checkpoint,
+inherit the licence of that checkpoint and its training corpus.