Replace with clean markdown card
Browse files
README.md
CHANGED
|
@@ -14,13 +14,12 @@ tags:
|
|
| 14 |
|
| 15 |
# EEGPT
|
| 16 |
|
| 17 |
-
EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) .
|
| 18 |
|
| 19 |
-
> **Architecture-only repository.**
|
| 20 |
> `braindecode.models.EEGPT` class. **No pretrained weights are
|
| 21 |
-
> distributed here**
|
| 22 |
-
> data
|
| 23 |
-
> separately.
|
| 24 |
|
| 25 |
## Quick start
|
| 26 |
|
|
@@ -39,486 +38,50 @@ model = EEGPT(
|
|
| 39 |
)
|
| 40 |
```
|
| 41 |
|
| 42 |
-
The signal-shape arguments above are
|
| 43 |
-
|
| 44 |
|
| 45 |
## Documentation
|
| 46 |
-
|
| 47 |
-
-
|
| 48 |
-
<https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
|
| 49 |
-
- Interactive browser with live instantiation:
|
| 50 |
<https://huggingface.co/spaces/braindecode/model-explorer>
|
| 51 |
- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
|
| 52 |
|
| 53 |
-
## Architecture description
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
-
<div class='bd-doc'><main>
|
| 59 |
-
<p>EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) <a class="citation-reference" href="#eegpt" id="citation-reference-1" role="doc-biblioref">[eegpt]</a>.</p>
|
| 60 |
-
<span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><figure class="align-center">
|
| 61 |
-
<img alt="EEGPT Architecture" src="https://github.com/BINE022/EEGPT/raw/main/figures/EEGPT.jpg" style="width: 1000px;" />
|
| 62 |
-
<figcaption>
|
| 63 |
-
<p>a) The EEGPT structure involves patching the input EEG signal as <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 64 |
-
<msub>
|
| 65 |
-
<mi>p</mi>
|
| 66 |
-
<mrow>
|
| 67 |
-
<mi>i</mi>
|
| 68 |
-
<mo>,</mo>
|
| 69 |
-
<mi>j</mi>
|
| 70 |
-
</mrow>
|
| 71 |
-
</msub>
|
| 72 |
-
</math> through masking
|
| 73 |
-
(50% time and 80% channel patches), creating masked part <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 74 |
-
<mi>ℳ</mi>
|
| 75 |
-
</math> and unmasked part <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 76 |
-
<mover accent="true">
|
| 77 |
-
<mi>ℳ</mi>
|
| 78 |
-
<mo stretchy="false">ˉ</mo>
|
| 79 |
-
</mover>
|
| 80 |
-
</math>.
|
| 81 |
-
b) Local spatio-temporal embedding maps patches to tokens.
|
| 82 |
-
c) Use of dual self-supervised learning with Spatio-Temporal Representation Alignment and Mask-based Reconstruction.</p>
|
| 83 |
-
</figcaption>
|
| 84 |
-
</figure>
|
| 85 |
-
<p><strong>EEGPT</strong> is a pretrained transformer model designed for universal EEG feature extraction.
|
| 86 |
-
It addresses challenges like low SNR and inter-subject variability by employing
|
| 87 |
-
a dual self-supervised learning method that combines <strong>Spatio-Temporal Representation Alignment</strong>
|
| 88 |
-
and <strong>Mask-based Reconstruction</strong> <a class="citation-reference" href="#eegpt" id="citation-reference-2" role="doc-biblioref">[eegpt]</a>.</p>
|
| 89 |
-
<p><strong>Model Overview (Layer-by-layer)</strong></p>
|
| 90 |
-
<ol class="arabic simple">
|
| 91 |
-
<li><p><strong>Patch embedding</strong> (<span class="docutils literal">_PatchEmbed</span> or <span class="docutils literal">_PatchNormEmbed</span>): split each channel into
|
| 92 |
-
<span class="docutils literal">patch_size</span> time patches and project to <span class="docutils literal">embed_dim</span>, yielding tokens with shape
|
| 93 |
-
<span class="docutils literal">(batch, n_patches, n_chans, embed_dim)</span>.</p></li>
|
| 94 |
-
<li><p><strong>Channel embedding</strong> (<span class="docutils literal">chan_embed</span>): add a learned embedding for each channel to preserve
|
| 95 |
-
spatial identity before attention.</p></li>
|
| 96 |
-
<li><p><strong>Transformer encoder blocks</strong> (<span class="docutils literal">_EEGTransformer.blocks</span>): for each patch group, append
|
| 97 |
-
<span class="docutils literal">embed_num</span> learned summary tokens and process the sequence with multi-head self-attention
|
| 98 |
-
and MLP layers.</p></li>
|
| 99 |
-
<li><p><strong>Summary extraction</strong>: keep only the summary tokens, apply <span class="docutils literal">norm</span> if set, and reshape back
|
| 100 |
-
to <span class="docutils literal">(batch, n_patches, embed_num, embed_dim)</span>.</p></li>
|
| 101 |
-
<li><p><strong>Task head</strong> (<span class="docutils literal">final_layer</span>): flatten summary tokens across patches and map to
|
| 102 |
-
<span class="docutils literal">n_outputs</span>; if <span class="docutils literal">return_encoder_output=True</span>, return the encoder features instead.</p></li>
|
| 103 |
-
</ol>
|
| 104 |
-
<p><strong>Dual Self-Supervised Learning</strong></p>
|
| 105 |
-
<p>EEGPT moves beyond simple masked reconstruction by introducing a representation alignment objective.
|
| 106 |
-
The pretraining loss <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 107 |
-
<mi>ℒ</mi>
|
| 108 |
-
</math> is the sum of alignment loss <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 109 |
-
<msub>
|
| 110 |
-
<mi>ℒ</mi>
|
| 111 |
-
<mi>A</mi>
|
| 112 |
-
</msub>
|
| 113 |
-
</math> and reconstruction loss <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 114 |
-
<msub>
|
| 115 |
-
<mi>ℒ</mi>
|
| 116 |
-
<mi>R</mi>
|
| 117 |
-
</msub>
|
| 118 |
-
</math>:</p>
|
| 119 |
-
<div>
|
| 120 |
-
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
|
| 121 |
-
<mi>ℒ</mi>
|
| 122 |
-
<mo>=</mo>
|
| 123 |
-
<msub>
|
| 124 |
-
<mi>ℒ</mi>
|
| 125 |
-
<mi>A</mi>
|
| 126 |
-
</msub>
|
| 127 |
-
<mo>+</mo>
|
| 128 |
-
<msub>
|
| 129 |
-
<mi>ℒ</mi>
|
| 130 |
-
<mi>R</mi>
|
| 131 |
-
</msub>
|
| 132 |
-
</math>
|
| 133 |
-
</div>
|
| 134 |
-
<ol class="arabic">
|
| 135 |
-
<li><p><strong>Spatio-Temporal Representation Alignment:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 136 |
-
<msub>
|
| 137 |
-
<mi>ℒ</mi>
|
| 138 |
-
<mi>A</mi>
|
| 139 |
-
</msub>
|
| 140 |
-
</math>)
|
| 141 |
-
Aligns the predicted features of masked regions with global features extracted by a Momentum Encoder.
|
| 142 |
-
This forces the model to learn semantic, high-level representations rather than just signal waveform details.</p>
|
| 143 |
-
<div>
|
| 144 |
-
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
|
| 145 |
-
<msub>
|
| 146 |
-
<mi>ℒ</mi>
|
| 147 |
-
<mi>A</mi>
|
| 148 |
-
</msub>
|
| 149 |
-
<mo>=</mo>
|
| 150 |
-
<mo form="prefix">−</mo>
|
| 151 |
-
<mfrac>
|
| 152 |
-
<mn>1</mn>
|
| 153 |
-
<mi>N</mi>
|
| 154 |
-
</mfrac>
|
| 155 |
-
<munderover>
|
| 156 |
-
<mo movablelimits="true">∑</mo>
|
| 157 |
-
<mrow>
|
| 158 |
-
<mi>j</mi>
|
| 159 |
-
<mo>=</mo>
|
| 160 |
-
<mn>1</mn>
|
| 161 |
-
</mrow>
|
| 162 |
-
<mi>N</mi>
|
| 163 |
-
</munderover>
|
| 164 |
-
<mo stretchy="false">|</mo>
|
| 165 |
-
<mo stretchy="false">|</mo>
|
| 166 |
-
<mi>p</mi>
|
| 167 |
-
<mi>r</mi>
|
| 168 |
-
<mi>e</mi>
|
| 169 |
-
<msub>
|
| 170 |
-
<mi>d</mi>
|
| 171 |
-
<mi>j</mi>
|
| 172 |
-
</msub>
|
| 173 |
-
<mo>−</mo>
|
| 174 |
-
<mi>L</mi>
|
| 175 |
-
<mi>N</mi>
|
| 176 |
-
<mo stretchy="false">(</mo>
|
| 177 |
-
<mi>m</mi>
|
| 178 |
-
<mi>e</mi>
|
| 179 |
-
<mi>n</mi>
|
| 180 |
-
<msub>
|
| 181 |
-
<mi>c</mi>
|
| 182 |
-
<mi>j</mi>
|
| 183 |
-
</msub>
|
| 184 |
-
<mo stretchy="false">)</mo>
|
| 185 |
-
<mo stretchy="false">|</mo>
|
| 186 |
-
<msubsup>
|
| 187 |
-
<mo stretchy="false">|</mo>
|
| 188 |
-
<mn>2</mn>
|
| 189 |
-
<mn>2</mn>
|
| 190 |
-
</msubsup>
|
| 191 |
-
</math>
|
| 192 |
-
</div>
|
| 193 |
-
<p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 194 |
-
<mi>p</mi>
|
| 195 |
-
<mi>r</mi>
|
| 196 |
-
<mi>e</mi>
|
| 197 |
-
<msub>
|
| 198 |
-
<mi>d</mi>
|
| 199 |
-
<mi>j</mi>
|
| 200 |
-
</msub>
|
| 201 |
-
</math> is the predictor output and <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 202 |
-
<mi>m</mi>
|
| 203 |
-
<mi>e</mi>
|
| 204 |
-
<mi>n</mi>
|
| 205 |
-
<msub>
|
| 206 |
-
<mi>c</mi>
|
| 207 |
-
<mi>j</mi>
|
| 208 |
-
</msub>
|
| 209 |
-
</math> is the momentum encoder output.</p>
|
| 210 |
-
</li>
|
| 211 |
-
<li><p><strong>Mask-based Reconstruction:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 212 |
-
<msub>
|
| 213 |
-
<mi>ℒ</mi>
|
| 214 |
-
<mi>R</mi>
|
| 215 |
-
</msub>
|
| 216 |
-
</math>)
|
| 217 |
-
Standard masked autoencoder objective to reconstruct the raw EEG patches, ensuring local temporal fidelity.</p>
|
| 218 |
-
<div>
|
| 219 |
-
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
|
| 220 |
-
<msub>
|
| 221 |
-
<mi>ℒ</mi>
|
| 222 |
-
<mi>R</mi>
|
| 223 |
-
</msub>
|
| 224 |
-
<mo>=</mo>
|
| 225 |
-
<mo form="prefix">−</mo>
|
| 226 |
-
<mfrac>
|
| 227 |
-
<mn>1</mn>
|
| 228 |
-
<mrow>
|
| 229 |
-
<mo stretchy="false">|</mo>
|
| 230 |
-
<mi>ℳ</mi>
|
| 231 |
-
<mo stretchy="false">|</mo>
|
| 232 |
-
</mrow>
|
| 233 |
-
</mfrac>
|
| 234 |
-
<munder>
|
| 235 |
-
<mo movablelimits="true">∑</mo>
|
| 236 |
-
<mrow>
|
| 237 |
-
<mo stretchy="false">(</mo>
|
| 238 |
-
<mi>i</mi>
|
| 239 |
-
<mo>,</mo>
|
| 240 |
-
<mi>j</mi>
|
| 241 |
-
<mo stretchy="false">)</mo>
|
| 242 |
-
<mo>∈</mo>
|
| 243 |
-
<mi>ℳ</mi>
|
| 244 |
-
</mrow>
|
| 245 |
-
</munder>
|
| 246 |
-
<mo stretchy="false">|</mo>
|
| 247 |
-
<mo stretchy="false">|</mo>
|
| 248 |
-
<mi>r</mi>
|
| 249 |
-
<mi>e</mi>
|
| 250 |
-
<msub>
|
| 251 |
-
<mi>c</mi>
|
| 252 |
-
<mrow>
|
| 253 |
-
<mi>i</mi>
|
| 254 |
-
<mo>,</mo>
|
| 255 |
-
<mi>j</mi>
|
| 256 |
-
</mrow>
|
| 257 |
-
</msub>
|
| 258 |
-
<mo>−</mo>
|
| 259 |
-
<mi>L</mi>
|
| 260 |
-
<mi>N</mi>
|
| 261 |
-
<mo stretchy="false">(</mo>
|
| 262 |
-
<msub>
|
| 263 |
-
<mi>p</mi>
|
| 264 |
-
<mrow>
|
| 265 |
-
<mi>i</mi>
|
| 266 |
-
<mo>,</mo>
|
| 267 |
-
<mi>j</mi>
|
| 268 |
-
</mrow>
|
| 269 |
-
</msub>
|
| 270 |
-
<mo stretchy="false">)</mo>
|
| 271 |
-
<mo stretchy="false">|</mo>
|
| 272 |
-
<msubsup>
|
| 273 |
-
<mo stretchy="false">|</mo>
|
| 274 |
-
<mn>2</mn>
|
| 275 |
-
<mn>2</mn>
|
| 276 |
-
</msubsup>
|
| 277 |
-
</math>
|
| 278 |
-
</div>
|
| 279 |
-
<p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 280 |
-
<mi>r</mi>
|
| 281 |
-
<mi>e</mi>
|
| 282 |
-
<msub>
|
| 283 |
-
<mi>c</mi>
|
| 284 |
-
<mrow>
|
| 285 |
-
<mi>i</mi>
|
| 286 |
-
<mo>,</mo>
|
| 287 |
-
<mi>j</mi>
|
| 288 |
-
</mrow>
|
| 289 |
-
</msub>
|
| 290 |
-
</math> is the reconstructed patch and <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 291 |
-
<msub>
|
| 292 |
-
<mi>p</mi>
|
| 293 |
-
<mrow>
|
| 294 |
-
<mi>i</mi>
|
| 295 |
-
<mo>,</mo>
|
| 296 |
-
<mi>j</mi>
|
| 297 |
-
</mrow>
|
| 298 |
-
</msub>
|
| 299 |
-
</math> is the original patch.</p>
|
| 300 |
-
</li>
|
| 301 |
-
</ol>
|
| 302 |
-
<p><strong>Macro Components</strong></p>
|
| 303 |
-
<ul class="simple">
|
| 304 |
-
<li><dl class="simple">
|
| 305 |
-
<dt><cite>EEGPT.target_encoder</cite> <strong>(Universal Encoder)</strong></dt>
|
| 306 |
-
<dd><ul>
|
| 307 |
-
<li><p><em>Operations.</em> A hierarchical backbone that consists of <strong>Local Spatio-Temporal Embedding</strong> followed
|
| 308 |
-
by a standard Transformer encoder <a class="citation-reference" href="#eegpt" id="citation-reference-3" role="doc-biblioref">[eegpt]</a>.</p></li>
|
| 309 |
-
<li><p><em>Role.</em> Maps raw spatio-temporal EEG patches into a sequence of latent tokens <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 310 |
-
<mi>z</mi>
|
| 311 |
-
</math>.</p></li>
|
| 312 |
-
</ul>
|
| 313 |
-
</dd>
|
| 314 |
-
</dl>
|
| 315 |
-
</li>
|
| 316 |
-
<li><dl class="simple">
|
| 317 |
-
<dt><cite>EEGPT.chans_id</cite> <strong>(Channel Identification)</strong></dt>
|
| 318 |
-
<dd><ul>
|
| 319 |
-
<li><p><em>Operations.</em> A buffer containing channel indices mapped from the standard channel names provided
|
| 320 |
-
in <span class="docutils literal">chs_info</span> <a class="citation-reference" href="#eegpt" id="citation-reference-4" role="doc-biblioref">[eegpt]</a>.</p></li>
|
| 321 |
-
<li><p><em>Role.</em> Provides the spatial identity for each input channel, allowing the model to look up
|
| 322 |
-
the correct channel embedding vector <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 323 |
-
<msub>
|
| 324 |
-
<mi>ς</mi>
|
| 325 |
-
<mi>i</mi>
|
| 326 |
-
</msub>
|
| 327 |
-
</math>.</p></li>
|
| 328 |
-
</ul>
|
| 329 |
-
</dd>
|
| 330 |
-
</dl>
|
| 331 |
-
</li>
|
| 332 |
-
<li><dl class="simple">
|
| 333 |
-
<dt><strong>Local Spatio-Temporal Embedding</strong> (Input Processing)</dt>
|
| 334 |
-
<dd><ul>
|
| 335 |
-
<li><p><em>Operations.</em> The input signal <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 336 |
-
<mi>X</mi>
|
| 337 |
-
</math> is chunked into patches <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 338 |
-
<msub>
|
| 339 |
-
<mi>p</mi>
|
| 340 |
-
<mrow>
|
| 341 |
-
<mi>i</mi>
|
| 342 |
-
<mo>,</mo>
|
| 343 |
-
<mi>j</mi>
|
| 344 |
-
</mrow>
|
| 345 |
-
</msub>
|
| 346 |
-
</math>. Each patch
|
| 347 |
-
is linearly projected and summed with a specific channel embedding:
|
| 348 |
-
<math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 349 |
-
<mi>t</mi>
|
| 350 |
-
<mi>o</mi>
|
| 351 |
-
<mi>k</mi>
|
| 352 |
-
<mi>e</mi>
|
| 353 |
-
<msub>
|
| 354 |
-
<mi>n</mi>
|
| 355 |
-
<mrow>
|
| 356 |
-
<mi>i</mi>
|
| 357 |
-
<mo>,</mo>
|
| 358 |
-
<mi>j</mi>
|
| 359 |
-
</mrow>
|
| 360 |
-
</msub>
|
| 361 |
-
<mo>=</mo>
|
| 362 |
-
<mtext>Embed</mtext>
|
| 363 |
-
<mo stretchy="false">(</mo>
|
| 364 |
-
<msub>
|
| 365 |
-
<mi>p</mi>
|
| 366 |
-
<mrow>
|
| 367 |
-
<mi>i</mi>
|
| 368 |
-
<mo>,</mo>
|
| 369 |
-
<mi>j</mi>
|
| 370 |
-
</mrow>
|
| 371 |
-
</msub>
|
| 372 |
-
<mo stretchy="false">)</mo>
|
| 373 |
-
<mo>+</mo>
|
| 374 |
-
<msub>
|
| 375 |
-
<mi>ς</mi>
|
| 376 |
-
<mi>i</mi>
|
| 377 |
-
</msub>
|
| 378 |
-
</math> <a class="citation-reference" href="#eegpt" id="citation-reference-5" role="doc-biblioref">[eegpt]</a>.</p></li>
|
| 379 |
-
<li><p><em>Role.</em> Converts the 2D EEG grid (Channels <math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 380 |
-
<mo>×</mo>
|
| 381 |
-
</math> Time) into a unified sequence of tokens
|
| 382 |
-
that preserves both channel identity and temporal order.</p></li>
|
| 383 |
-
</ul>
|
| 384 |
-
</dd>
|
| 385 |
-
</dl>
|
| 386 |
-
</li>
|
| 387 |
-
</ul>
|
| 388 |
-
<p><strong>How the information is encoded temporally, spatially, and spectrally</strong></p>
|
| 389 |
-
<ul class="simple">
|
| 390 |
-
<li><p><strong>Temporal.</strong>
|
| 391 |
-
The model segments continuous EEG signals into small, non-overlapping patches (e.g., 250ms windows
|
| 392 |
-
with <span class="docutils literal">patch_size=64</span>) <a class="citation-reference" href="#eegpt" id="citation-reference-6" role="doc-biblioref">[eegpt]</a>. This <strong>Patching</strong> mechanism captures short-term local temporal
|
| 393 |
-
structure, while the subsequent Transformer encoder captures long-range temporal dependencies across
|
| 394 |
-
the entire window.</p></li>
|
| 395 |
-
<li><p><strong>Spatial.</strong>
|
| 396 |
-
Unlike convolutional models that may rely on fixed spatial order, EEGPT uses <strong>Channel Embeddings</strong>
|
| 397 |
-
<math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 398 |
-
<msub>
|
| 399 |
-
<mi>ς</mi>
|
| 400 |
-
<mi>i</mi>
|
| 401 |
-
</msub>
|
| 402 |
-
</math> <a class="citation-reference" href="#eegpt" id="citation-reference-7" role="doc-biblioref">[eegpt]</a>. Each channel's data is treated as a distinct sequence of tokens tagged
|
| 403 |
-
with its spatial identity. This allows the model to flexibly handle different montages and
|
| 404 |
-
missing channels by simply mapping channel names to their corresponding learnable embeddings.</p></li>
|
| 405 |
-
<li><p><strong>Spectral.</strong>
|
| 406 |
-
Spectral information is implicitly learned through the <strong>Mask-based Reconstruction</strong> objective
|
| 407 |
-
(<math xmlns="http://www.w3.org/1998/Math/MathML">
|
| 408 |
-
<msub>
|
| 409 |
-
<mi>ℒ</mi>
|
| 410 |
-
<mi>R</mi>
|
| 411 |
-
</msub>
|
| 412 |
-
</math>) <a class="citation-reference" href="#eegpt" id="citation-reference-8" role="doc-biblioref">[eegpt]</a>. By forcing the model to reconstruct raw waveforms (including phase
|
| 413 |
-
and amplitude) from masked inputs, the model learns to encode frequency-specific patterns necessary
|
| 414 |
-
refines this by encouraging these spectral features to align with robust, high-level semantic representations.</p></li>
|
| 415 |
-
</ul>
|
| 416 |
-
<p><strong>Pretrained Weights</strong></p>
|
| 417 |
-
<p>Weights are available on <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">HuggingFace</a>.</p>
|
| 418 |
-
<aside class="admonition important">
|
| 419 |
-
<p class="admonition-title">Important</p>
|
| 420 |
-
<p><strong>Pre-trained Weights Available</strong></p>
|
| 421 |
-
<p>This model has pre-trained weights available on the Hugging Face Hub.
|
| 422 |
-
<a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">Link here</a>.</p>
|
| 423 |
-
<p>You can load them using:</p>
|
| 424 |
-
<p>To push your own trained model to the Hub:</p>
|
| 425 |
-
<p>Requires installing <span class="docutils literal">braindecode[hug]</span> for Hub integration.</p>
|
| 426 |
-
</aside>
|
| 427 |
-
<p><strong>Usage</strong></p>
|
| 428 |
-
<p>The model can be initialized for specific downstream tasks (e.g., classification) by specifying
|
| 429 |
-
<cite>n_outputs</cite>, <cite>chs_info</cite>, <cite>n_times</cite>.</p>
|
| 430 |
-
<section id="parameters">
|
| 431 |
-
<h2>Parameters</h2>
|
| 432 |
-
<dl class="simple">
|
| 433 |
-
<dt>return_encoder_output<span class="classifier">bool, default=False</span></dt>
|
| 434 |
-
<dd><p>Whether to return the encoder output or the classifier output.</p>
|
| 435 |
-
</dd>
|
| 436 |
-
<dt>patch_size<span class="classifier">int, default=64</span></dt>
|
| 437 |
-
<dd><p>Size of the patches for the transformer.</p>
|
| 438 |
-
</dd>
|
| 439 |
-
<dt>patch_stride<span class="classifier">int, default=32</span></dt>
|
| 440 |
-
<dd><p>Stride of the patches for the transformer.</p>
|
| 441 |
-
</dd>
|
| 442 |
-
<dt>embed_num<span class="classifier">int, default=4</span></dt>
|
| 443 |
-
<dd><p>Number of summary tokens used for the global representation.</p>
|
| 444 |
-
</dd>
|
| 445 |
-
<dt>embed_dim<span class="classifier">int, default=512</span></dt>
|
| 446 |
-
<dd><p>Dimension of the embeddings.</p>
|
| 447 |
-
</dd>
|
| 448 |
-
<dt>depth<span class="classifier">int, default=8</span></dt>
|
| 449 |
-
<dd><p>Number of transformer layers.</p>
|
| 450 |
-
</dd>
|
| 451 |
-
<dt>num_heads<span class="classifier">int, default=8</span></dt>
|
| 452 |
-
<dd><p>Number of attention heads.</p>
|
| 453 |
-
</dd>
|
| 454 |
-
<dt>mlp_ratio<span class="classifier">float, default=4.0</span></dt>
|
| 455 |
-
<dd><p>Ratio of the MLP hidden dimension to the embedding dimension.</p>
|
| 456 |
-
</dd>
|
| 457 |
-
<dt>drop_prob<span class="classifier">float, default=0.0</span></dt>
|
| 458 |
-
<dd><p>Dropout probability.</p>
|
| 459 |
-
</dd>
|
| 460 |
-
<dt>attn_drop_rate<span class="classifier">float, default=0.0</span></dt>
|
| 461 |
-
<dd><p>Attention dropout rate.</p>
|
| 462 |
-
</dd>
|
| 463 |
-
<dt>drop_path_rate<span class="classifier">float, default=0.0</span></dt>
|
| 464 |
-
<dd><p>Drop path rate.</p>
|
| 465 |
-
</dd>
|
| 466 |
-
<dt>init_std<span class="classifier">float, default=0.02</span></dt>
|
| 467 |
-
<dd><p>Standard deviation for weight initialization.</p>
|
| 468 |
-
</dd>
|
| 469 |
-
<dt>qkv_bias<span class="classifier">bool, default=True</span></dt>
|
| 470 |
-
<dd><p>Whether to use bias in the QKV projection.</p>
|
| 471 |
-
</dd>
|
| 472 |
-
<dt>norm_layer<span class="classifier">torch.nn.Module, default=None</span></dt>
|
| 473 |
-
<dd><p>Normalization layer. If None, defaults to <span class="docutils literal">nn.LayerNorm</span> with epsilon <span class="docutils literal">layer_norm_eps</span>.</p>
|
| 474 |
-
</dd>
|
| 475 |
-
<dt>layer_norm_eps<span class="classifier">float, default=1e-6</span></dt>
|
| 476 |
-
<dd><p>Epsilon value for the normalization layer.</p>
|
| 477 |
-
</dd>
|
| 478 |
-
</dl>
|
| 479 |
-
</section>
|
| 480 |
-
<section id="references">
|
| 481 |
-
<h2>References</h2>
|
| 482 |
-
<div role="list" class="citation-list">
|
| 483 |
-
<div class="citation" id="eegpt" role="doc-biblioentry">
|
| 484 |
-
<span class="label"><span class="fn-bracket">[</span>eegpt<span class="fn-bracket">]</span></span>
|
| 485 |
-
<span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>,<a role="doc-backlink" href="#citation-reference-6">6</a>,<a role="doc-backlink" href="#citation-reference-7">7</a>,<a role="doc-backlink" href="#citation-reference-8">8</a>)</span>
|
| 486 |
-
<p>Wang, G., Liu, W., He, Y., Xu, C., Ma, L., & Li, H. (2024).
|
| 487 |
-
EEGPT: Pretrained transformer for universal and reliable representation of eeg signals.
|
| 488 |
-
Advances in Neural Information Processing Systems, 37, 39249-39280.
|
| 489 |
-
Online: <a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf">https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf</a></p>
|
| 490 |
-
</div>
|
| 491 |
-
</div>
|
| 492 |
-
</section>
|
| 493 |
-
<section id="notes">
|
| 494 |
-
<h2>Notes</h2>
|
| 495 |
-
<p>When loading pretrained weights from the original EEGPT checkpoint (e.g., for
|
| 496 |
-
fine-tuning), you may encounter "unexpected keys" related to the <cite>predictor</cite>
|
| 497 |
-
and <cite>reconstructor</cite> modules (e.g., <cite>predictor.mask_token</cite>, <cite>reconstructor.time_embed</cite>).
|
| 498 |
-
These components are used only during the self-supervised pre-training phase
|
| 499 |
-
(Masked Auto-Encoder) and are not part of this encoder-only model used for
|
| 500 |
-
downstream tasks. It is safe to ignore them.</p>
|
| 501 |
-
<p><strong>Hugging Face Hub integration</strong></p>
|
| 502 |
-
<p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
|
| 503 |
-
automatically gain the ability to be pushed to and loaded from the
|
| 504 |
-
Hugging Face Hub. Install with:</p>
|
| 505 |
-
<pre class="literal-block">pip install braindecode[hub]</pre>
|
| 506 |
-
<p><strong>Pushing a model to the Hub:</strong></p>
|
| 507 |
-
<p><strong>Loading a model from the Hub:</strong></p>
|
| 508 |
-
<p><strong>Extracting features and replacing the head:</strong></p>
|
| 509 |
-
<p><strong>Saving and restoring full configuration:</strong></p>
|
| 510 |
-
<p>All model parameters (both EEG-specific and model-specific such as
|
| 511 |
-
dropout rates, activation functions, number of filters) are automatically
|
| 512 |
-
saved to the Hub and restored when loading.</p>
|
| 513 |
-
<p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
|
| 514 |
-
</section>
|
| 515 |
-
</main>
|
| 516 |
-
</div>
|
| 517 |
|
| 518 |
## Citation
|
| 519 |
|
| 520 |
-
|
| 521 |
-
*References* section above) and braindecode:
|
| 522 |
|
| 523 |
```bibtex
|
| 524 |
@article{aristimunha2025braindecode,
|
|
|
|
| 14 |
|
| 15 |
# EEGPT
|
| 16 |
|
| 17 |
+
EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) [eegpt].
|
| 18 |
|
| 19 |
+
> **Architecture-only repository.** Documents the
|
| 20 |
> `braindecode.models.EEGPT` class. **No pretrained weights are
|
| 21 |
+
> distributed here.** Instantiate the model and train it on your own
|
| 22 |
+
> data.
|
|
|
|
| 23 |
|
| 24 |
## Quick start
|
| 25 |
|
|
|
|
| 38 |
)
|
| 39 |
```
|
| 40 |
|
| 41 |
+
The signal-shape arguments above are illustrative defaults — adjust to
|
| 42 |
+
match your recording.
|
| 43 |
|
| 44 |
## Documentation
|
| 45 |
+
- Full API reference: <https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
|
| 46 |
+
- Interactive browser (live instantiation, parameter counts):
|
|
|
|
|
|
|
| 47 |
<https://huggingface.co/spaces/braindecode/model-explorer>
|
| 48 |
- Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
|
| 49 |
|
|
|
|
| 50 |
|
| 51 |
+
## Architecture
|
| 52 |
+
|
| 53 |
+

|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
## Parameters
|
| 57 |
+
|
| 58 |
+
| Parameter | Type | Description |
|
| 59 |
+
|---|---|---|
|
| 60 |
+
| `return_encoder_output` | bool, default=False | Whether to return the encoder output or the classifier output. |
|
| 61 |
+
| `patch_size` | int, default=64 | Size of the patches for the transformer. |
|
| 62 |
+
| `patch_stride` | int, default=32 | Stride of the patches for the transformer. |
|
| 63 |
+
| `embed_num` | int, default=4 | Number of summary tokens used for the global representation. |
|
| 64 |
+
| `embed_dim` | int, default=512 | Dimension of the embeddings. |
|
| 65 |
+
| `depth` | int, default=8 | Number of transformer layers. |
|
| 66 |
+
| `num_heads` | int, default=8 | Number of attention heads. |
|
| 67 |
+
| `mlp_ratio` | float, default=4.0 | Ratio of the MLP hidden dimension to the embedding dimension. |
|
| 68 |
+
| `drop_prob` | float, default=0.0 | Dropout probability. |
|
| 69 |
+
| `attn_drop_rate` | float, default=0.0 | Attention dropout rate. |
|
| 70 |
+
| `drop_path_rate` | float, default=0.0 | Drop path rate. |
|
| 71 |
+
| `init_std` | float, default=0.02 | Standard deviation for weight initialization. |
|
| 72 |
+
| `qkv_bias` | bool, default=True | Whether to use bias in the QKV projection. |
|
| 73 |
+
| `norm_layer` | torch.nn.Module, default=None | Normalization layer. If None, defaults to `nn.LayerNorm` with epsilon `layer_norm_eps`. |
|
| 74 |
+
| `layer_norm_eps` | float, default=1e-6 | Epsilon value for the normalization layer. |
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
## References
|
| 78 |
+
|
| 79 |
+
1. Wang, G., Liu, W., He, Y., Xu, C., Ma, L., & Li, H. (2024). EEGPT: Pretrained transformer for universal and reliable representation of eeg signals. Advances in Neural Information Processing Systems, 37, 39249-39280. Online: https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## Citation
|
| 83 |
|
| 84 |
+
Cite the original architecture paper (see *References* above) and braindecode:
|
|
|
|
| 85 |
|
| 86 |
```bibtex
|
| 87 |
@article{aristimunha2025braindecode,
|