File size: 5,518 Bytes

---
license: bsd-3-clause
library_name: braindecode
pipeline_tag: feature-extraction
tags:
  - eeg
  - biosignal
  - pytorch
  - neuroscience
  - braindecode
  - convolutional
  - transformer
---

# AttentionBaseNet

AttentionBaseNet from Wimpff M et al (2023) [Martin2023].

> **Architecture-only repository.** Documents the
> `braindecode.models.AttentionBaseNet` class. **No pretrained weights are
> distributed here.** Instantiate the model and train it on your own
> data.

## Quick start

```bash
pip install braindecode
```

```python
from braindecode.models import AttentionBaseNet

model = AttentionBaseNet(
    n_chans=22,
    sfreq=250,
    input_window_seconds=4.0,
    n_outputs=4,
)
```

The signal-shape arguments above are illustrative defaults — adjust to
match your recording.

## Documentation
- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.AttentionBaseNet.html>
- Interactive browser (live instantiation, parameter counts):
  <https://huggingface.co/spaces/braindecode/model-explorer>
- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/attentionbasenet.py#L29>


## Architecture

![AttentionBaseNet architecture](https://content.cld.iop.org/journals/1741-2552/21/3/036020/revision2/jnead48b9f2_hr.jpg)


## Parameters

| Parameter | Type | Description |
|---|---|---|
| `n_temporal_filters` | int, optional | Number of temporal convolutional filters in the first layer. This defines the number of output channels after the temporal convolution. Default is 40. |
| `temp_filter_length` | int, default=15 | The length of the temporal filters in the convolutional layers. |
| `spatial_expansion` | int, optional | Multiplicative factor to expand the spatial dimensions. Used to increase the capacity of the model by expanding spatial features. Default is 1. |
| `pool_length_inp` | int, optional | Length of the pooling window in the input layer. Determines how much temporal information is aggregated during pooling. Default is 75. |
| `pool_stride_inp` | int, optional | Stride of the pooling operation in the input layer. Controls the downsampling factor in the temporal dimension. Default is 15. |
| `drop_prob_inp` | float, optional | Dropout rate applied after the input layer. This is the probability of zeroing out elements during training to prevent overfitting. Default is 0.5. |
| `ch_dim` | int, optional | Number of channels in the subsequent convolutional layers. This controls the depth of the network after the initial layer. Default is 16. |
| `attention_mode` | str, optional | The type of attention mechanism to apply. If `None`, no attention is applied. - "se" for Squeeze-and-excitation network - "gsop" for Global Second-Order Pooling - "fca" for Frequency Channel Attention Network - "encnet" for context encoding module - "eca" for Efficient channel attention for deep convolutional neural networks - "ge" for Gather-Excite - "gct" for Gated Channel Transformation - "srm" for Style-based Recalibration Module - "cbam" for Convolutional Block Attention Module - "cat" for Learning to collaborate channel and temporal attention from multi-information fusion - "catlite" for Learning to collaborate channel attention from multi-information fusion (lite version, cat w/o temporal attention) |
| `pool_length` | int, default=8 | The length of the window for the average pooling operation. |
| `pool_stride` | int, default=8 | The stride of the average pooling operation. |
| `drop_prob_attn` | float, default=0.5 | The dropout rate for regularization for the attention layer. Values should be between 0 and 1. |
| `reduction_rate` | int, default=4 | The reduction rate used in the attention mechanism to reduce dimensionality and computational complexity. |
| `use_mlp` | bool, default=False | Flag to indicate whether an MLP (Multi-Layer Perceptron) should be used within the attention mechanism for further processing. |
| `freq_idx` | int, default=0 | DCT index used in fca attention mechanism. |
| `n_codewords` | int, default=4 | The number of codewords (clusters) used in attention mechanisms that employ quantization or clustering strategies. |
| `kernel_size` | int, default=9 | The kernel size used in certain types of attention mechanisms for convolution operations. |
| `activation` | type[nn.Module] = nn.ELU, | Activation function class to apply. Should be a PyTorch activation module class like `nn.ReLU` or `nn.ELU`. Default is `nn.ELU`. |
| `extra_params` | bool, default=False | Flag to indicate whether additional, custom parameters should be passed to the attention mechanism. |


## References

1. Wimpff, M., Gizzi, L., Zerfowski, J. and Yang, B., 2023. EEG motor imagery decoding: A framework for comparative analysis with channel attention mechanisms. arXiv preprint arXiv:2310.11198.
2. Wimpff, M., Gizzi, L., Zerfowski, J. and Yang, B. GitHub https://github.com/martinwimpff/channel-attention (accessed 2024-03-28)


## Citation

Cite the original architecture paper (see *References* above) and braindecode:

```bibtex
@article{aristimunha2025braindecode,
  title   = {Braindecode: a deep learning library for raw electrophysiological data},
  author  = {Aristimunha, Bruno and others},
  journal = {Zenodo},
  year    = {2025},
  doi     = {10.5281/zenodo.17699192},
}
```

## License

BSD-3-Clause for the model code (matching braindecode).
Pretraining-derived weights, if you fine-tune from a checkpoint,
inherit the licence of that checkpoint and its training corpus.