bruAristimunha commited on
Commit
a9acad8
·
verified ·
1 Parent(s): 62bf293

Replace with clean markdown card

Browse files
Files changed (1) hide show
  1. README.md +38 -475
README.md CHANGED
@@ -14,13 +14,12 @@ tags:
14
 
15
  # EEGPT
16
 
17
- EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) .
18
 
19
- > **Architecture-only repository.** This repo documents the
20
  > `braindecode.models.EEGPT` class. **No pretrained weights are
21
- > distributed here** instantiate the model and train it on your own
22
- > data, or fine-tune from a published foundation-model checkpoint
23
- > separately.
24
 
25
  ## Quick start
26
 
@@ -39,486 +38,50 @@ model = EEGPT(
39
  )
40
  ```
41
 
42
- The signal-shape arguments above are example defaults — adjust them
43
- to match your recording.
44
 
45
  ## Documentation
46
-
47
- - Full API reference (parameters, references, architecture figure):
48
- <https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
49
- - Interactive browser with live instantiation:
50
  <https://huggingface.co/spaces/braindecode/model-explorer>
51
  - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
52
 
53
- ## Architecture description
54
 
55
- The block below is the rendered class docstring (parameters,
56
- references, architecture figure where available).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
- <div class='bd-doc'><main>
59
- <p>EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) <a class="citation-reference" href="#eegpt" id="citation-reference-1" role="doc-biblioref">[eegpt]</a>.</p>
60
- <span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#d9534f;color:white;font-size:11px;font-weight:600;margin-right:4px;">Foundation Model</span><span style="display:inline-block;padding:2px 8px;border-radius:4px;background:#56B4E9;color:white;font-size:11px;font-weight:600;margin-right:4px;">Attention/Transformer</span><figure class="align-center">
61
- <img alt="EEGPT Architecture" src="https://github.com/BINE022/EEGPT/raw/main/figures/EEGPT.jpg" style="width: 1000px;" />
62
- <figcaption>
63
- <p>a) The EEGPT structure involves patching the input EEG signal as <math xmlns="http://www.w3.org/1998/Math/MathML">
64
- <msub>
65
- <mi>p</mi>
66
- <mrow>
67
- <mi>i</mi>
68
- <mo>,</mo>
69
- <mi>j</mi>
70
- </mrow>
71
- </msub>
72
- </math> through masking
73
- (50% time and 80% channel patches), creating masked part <math xmlns="http://www.w3.org/1998/Math/MathML">
74
- <mi>ℳ</mi>
75
- </math> and unmasked part <math xmlns="http://www.w3.org/1998/Math/MathML">
76
- <mover accent="true">
77
- <mi>ℳ</mi>
78
- <mo stretchy="false">ˉ</mo>
79
- </mover>
80
- </math>.
81
- b) Local spatio-temporal embedding maps patches to tokens.
82
- c) Use of dual self-supervised learning with Spatio-Temporal Representation Alignment and Mask-based Reconstruction.</p>
83
- </figcaption>
84
- </figure>
85
- <p><strong>EEGPT</strong> is a pretrained transformer model designed for universal EEG feature extraction.
86
- It addresses challenges like low SNR and inter-subject variability by employing
87
- a dual self-supervised learning method that combines <strong>Spatio-Temporal Representation Alignment</strong>
88
- and <strong>Mask-based Reconstruction</strong> <a class="citation-reference" href="#eegpt" id="citation-reference-2" role="doc-biblioref">[eegpt]</a>.</p>
89
- <p><strong>Model Overview (Layer-by-layer)</strong></p>
90
- <ol class="arabic simple">
91
- <li><p><strong>Patch embedding</strong> (<span class="docutils literal">_PatchEmbed</span> or <span class="docutils literal">_PatchNormEmbed</span>): split each channel into
92
- <span class="docutils literal">patch_size</span> time patches and project to <span class="docutils literal">embed_dim</span>, yielding tokens with shape
93
- <span class="docutils literal">(batch, n_patches, n_chans, embed_dim)</span>.</p></li>
94
- <li><p><strong>Channel embedding</strong> (<span class="docutils literal">chan_embed</span>): add a learned embedding for each channel to preserve
95
- spatial identity before attention.</p></li>
96
- <li><p><strong>Transformer encoder blocks</strong> (<span class="docutils literal">_EEGTransformer.blocks</span>): for each patch group, append
97
- <span class="docutils literal">embed_num</span> learned summary tokens and process the sequence with multi-head self-attention
98
- and MLP layers.</p></li>
99
- <li><p><strong>Summary extraction</strong>: keep only the summary tokens, apply <span class="docutils literal">norm</span> if set, and reshape back
100
- to <span class="docutils literal">(batch, n_patches, embed_num, embed_dim)</span>.</p></li>
101
- <li><p><strong>Task head</strong> (<span class="docutils literal">final_layer</span>): flatten summary tokens across patches and map to
102
- <span class="docutils literal">n_outputs</span>; if <span class="docutils literal">return_encoder_output=True</span>, return the encoder features instead.</p></li>
103
- </ol>
104
- <p><strong>Dual Self-Supervised Learning</strong></p>
105
- <p>EEGPT moves beyond simple masked reconstruction by introducing a representation alignment objective.
106
- The pretraining loss <math xmlns="http://www.w3.org/1998/Math/MathML">
107
- <mi>ℒ</mi>
108
- </math> is the sum of alignment loss <math xmlns="http://www.w3.org/1998/Math/MathML">
109
- <msub>
110
- <mi>ℒ</mi>
111
- <mi>A</mi>
112
- </msub>
113
- </math> and reconstruction loss <math xmlns="http://www.w3.org/1998/Math/MathML">
114
- <msub>
115
- <mi>ℒ</mi>
116
- <mi>R</mi>
117
- </msub>
118
- </math>:</p>
119
- <div>
120
- <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
121
- <mi>ℒ</mi>
122
- <mo>=</mo>
123
- <msub>
124
- <mi>ℒ</mi>
125
- <mi>A</mi>
126
- </msub>
127
- <mo>+</mo>
128
- <msub>
129
- <mi>ℒ</mi>
130
- <mi>R</mi>
131
- </msub>
132
- </math>
133
- </div>
134
- <ol class="arabic">
135
- <li><p><strong>Spatio-Temporal Representation Alignment:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
136
- <msub>
137
- <mi>ℒ</mi>
138
- <mi>A</mi>
139
- </msub>
140
- </math>)
141
- Aligns the predicted features of masked regions with global features extracted by a Momentum Encoder.
142
- This forces the model to learn semantic, high-level representations rather than just signal waveform details.</p>
143
- <div>
144
- <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
145
- <msub>
146
- <mi>ℒ</mi>
147
- <mi>A</mi>
148
- </msub>
149
- <mo>=</mo>
150
- <mo form="prefix">−</mo>
151
- <mfrac>
152
- <mn>1</mn>
153
- <mi>N</mi>
154
- </mfrac>
155
- <munderover>
156
- <mo movablelimits="true">∑</mo>
157
- <mrow>
158
- <mi>j</mi>
159
- <mo>=</mo>
160
- <mn>1</mn>
161
- </mrow>
162
- <mi>N</mi>
163
- </munderover>
164
- <mo stretchy="false">|</mo>
165
- <mo stretchy="false">|</mo>
166
- <mi>p</mi>
167
- <mi>r</mi>
168
- <mi>e</mi>
169
- <msub>
170
- <mi>d</mi>
171
- <mi>j</mi>
172
- </msub>
173
- <mo>−</mo>
174
- <mi>L</mi>
175
- <mi>N</mi>
176
- <mo stretchy="false">(</mo>
177
- <mi>m</mi>
178
- <mi>e</mi>
179
- <mi>n</mi>
180
- <msub>
181
- <mi>c</mi>
182
- <mi>j</mi>
183
- </msub>
184
- <mo stretchy="false">)</mo>
185
- <mo stretchy="false">|</mo>
186
- <msubsup>
187
- <mo stretchy="false">|</mo>
188
- <mn>2</mn>
189
- <mn>2</mn>
190
- </msubsup>
191
- </math>
192
- </div>
193
- <p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
194
- <mi>p</mi>
195
- <mi>r</mi>
196
- <mi>e</mi>
197
- <msub>
198
- <mi>d</mi>
199
- <mi>j</mi>
200
- </msub>
201
- </math> is the predictor output and <math xmlns="http://www.w3.org/1998/Math/MathML">
202
- <mi>m</mi>
203
- <mi>e</mi>
204
- <mi>n</mi>
205
- <msub>
206
- <mi>c</mi>
207
- <mi>j</mi>
208
- </msub>
209
- </math> is the momentum encoder output.</p>
210
- </li>
211
- <li><p><strong>Mask-based Reconstruction:</strong> (<math xmlns="http://www.w3.org/1998/Math/MathML">
212
- <msub>
213
- <mi>ℒ</mi>
214
- <mi>R</mi>
215
- </msub>
216
- </math>)
217
- Standard masked autoencoder objective to reconstruct the raw EEG patches, ensuring local temporal fidelity.</p>
218
- <div>
219
- <math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
220
- <msub>
221
- <mi>ℒ</mi>
222
- <mi>R</mi>
223
- </msub>
224
- <mo>=</mo>
225
- <mo form="prefix">−</mo>
226
- <mfrac>
227
- <mn>1</mn>
228
- <mrow>
229
- <mo stretchy="false">|</mo>
230
- <mi>ℳ</mi>
231
- <mo stretchy="false">|</mo>
232
- </mrow>
233
- </mfrac>
234
- <munder>
235
- <mo movablelimits="true">∑</mo>
236
- <mrow>
237
- <mo stretchy="false">(</mo>
238
- <mi>i</mi>
239
- <mo>,</mo>
240
- <mi>j</mi>
241
- <mo stretchy="false">)</mo>
242
- <mo>∈</mo>
243
- <mi>ℳ</mi>
244
- </mrow>
245
- </munder>
246
- <mo stretchy="false">|</mo>
247
- <mo stretchy="false">|</mo>
248
- <mi>r</mi>
249
- <mi>e</mi>
250
- <msub>
251
- <mi>c</mi>
252
- <mrow>
253
- <mi>i</mi>
254
- <mo>,</mo>
255
- <mi>j</mi>
256
- </mrow>
257
- </msub>
258
- <mo>−</mo>
259
- <mi>L</mi>
260
- <mi>N</mi>
261
- <mo stretchy="false">(</mo>
262
- <msub>
263
- <mi>p</mi>
264
- <mrow>
265
- <mi>i</mi>
266
- <mo>,</mo>
267
- <mi>j</mi>
268
- </mrow>
269
- </msub>
270
- <mo stretchy="false">)</mo>
271
- <mo stretchy="false">|</mo>
272
- <msubsup>
273
- <mo stretchy="false">|</mo>
274
- <mn>2</mn>
275
- <mn>2</mn>
276
- </msubsup>
277
- </math>
278
- </div>
279
- <p>where <math xmlns="http://www.w3.org/1998/Math/MathML">
280
- <mi>r</mi>
281
- <mi>e</mi>
282
- <msub>
283
- <mi>c</mi>
284
- <mrow>
285
- <mi>i</mi>
286
- <mo>,</mo>
287
- <mi>j</mi>
288
- </mrow>
289
- </msub>
290
- </math> is the reconstructed patch and <math xmlns="http://www.w3.org/1998/Math/MathML">
291
- <msub>
292
- <mi>p</mi>
293
- <mrow>
294
- <mi>i</mi>
295
- <mo>,</mo>
296
- <mi>j</mi>
297
- </mrow>
298
- </msub>
299
- </math> is the original patch.</p>
300
- </li>
301
- </ol>
302
- <p><strong>Macro Components</strong></p>
303
- <ul class="simple">
304
- <li><dl class="simple">
305
- <dt><cite>EEGPT.target_encoder</cite> <strong>(Universal Encoder)</strong></dt>
306
- <dd><ul>
307
- <li><p><em>Operations.</em> A hierarchical backbone that consists of <strong>Local Spatio-Temporal Embedding</strong> followed
308
- by a standard Transformer encoder <a class="citation-reference" href="#eegpt" id="citation-reference-3" role="doc-biblioref">[eegpt]</a>.</p></li>
309
- <li><p><em>Role.</em> Maps raw spatio-temporal EEG patches into a sequence of latent tokens <math xmlns="http://www.w3.org/1998/Math/MathML">
310
- <mi>z</mi>
311
- </math>.</p></li>
312
- </ul>
313
- </dd>
314
- </dl>
315
- </li>
316
- <li><dl class="simple">
317
- <dt><cite>EEGPT.chans_id</cite> <strong>(Channel Identification)</strong></dt>
318
- <dd><ul>
319
- <li><p><em>Operations.</em> A buffer containing channel indices mapped from the standard channel names provided
320
- in <span class="docutils literal">chs_info</span> <a class="citation-reference" href="#eegpt" id="citation-reference-4" role="doc-biblioref">[eegpt]</a>.</p></li>
321
- <li><p><em>Role.</em> Provides the spatial identity for each input channel, allowing the model to look up
322
- the correct channel embedding vector <math xmlns="http://www.w3.org/1998/Math/MathML">
323
- <msub>
324
- <mi>ς</mi>
325
- <mi>i</mi>
326
- </msub>
327
- </math>.</p></li>
328
- </ul>
329
- </dd>
330
- </dl>
331
- </li>
332
- <li><dl class="simple">
333
- <dt><strong>Local Spatio-Temporal Embedding</strong> (Input Processing)</dt>
334
- <dd><ul>
335
- <li><p><em>Operations.</em> The input signal <math xmlns="http://www.w3.org/1998/Math/MathML">
336
- <mi>X</mi>
337
- </math> is chunked into patches <math xmlns="http://www.w3.org/1998/Math/MathML">
338
- <msub>
339
- <mi>p</mi>
340
- <mrow>
341
- <mi>i</mi>
342
- <mo>,</mo>
343
- <mi>j</mi>
344
- </mrow>
345
- </msub>
346
- </math>. Each patch
347
- is linearly projected and summed with a specific channel embedding:
348
- <math xmlns="http://www.w3.org/1998/Math/MathML">
349
- <mi>t</mi>
350
- <mi>o</mi>
351
- <mi>k</mi>
352
- <mi>e</mi>
353
- <msub>
354
- <mi>n</mi>
355
- <mrow>
356
- <mi>i</mi>
357
- <mo>,</mo>
358
- <mi>j</mi>
359
- </mrow>
360
- </msub>
361
- <mo>=</mo>
362
- <mtext>Embed</mtext>
363
- <mo stretchy="false">(</mo>
364
- <msub>
365
- <mi>p</mi>
366
- <mrow>
367
- <mi>i</mi>
368
- <mo>,</mo>
369
- <mi>j</mi>
370
- </mrow>
371
- </msub>
372
- <mo stretchy="false">)</mo>
373
- <mo>+</mo>
374
- <msub>
375
- <mi>ς</mi>
376
- <mi>i</mi>
377
- </msub>
378
- </math> <a class="citation-reference" href="#eegpt" id="citation-reference-5" role="doc-biblioref">[eegpt]</a>.</p></li>
379
- <li><p><em>Role.</em> Converts the 2D EEG grid (Channels <math xmlns="http://www.w3.org/1998/Math/MathML">
380
- <mo>×</mo>
381
- </math> Time) into a unified sequence of tokens
382
- that preserves both channel identity and temporal order.</p></li>
383
- </ul>
384
- </dd>
385
- </dl>
386
- </li>
387
- </ul>
388
- <p><strong>How the information is encoded temporally, spatially, and spectrally</strong></p>
389
- <ul class="simple">
390
- <li><p><strong>Temporal.</strong>
391
- The model segments continuous EEG signals into small, non-overlapping patches (e.g., 250ms windows
392
- with <span class="docutils literal">patch_size=64</span>) <a class="citation-reference" href="#eegpt" id="citation-reference-6" role="doc-biblioref">[eegpt]</a>. This <strong>Patching</strong> mechanism captures short-term local temporal
393
- structure, while the subsequent Transformer encoder captures long-range temporal dependencies across
394
- the entire window.</p></li>
395
- <li><p><strong>Spatial.</strong>
396
- Unlike convolutional models that may rely on fixed spatial order, EEGPT uses <strong>Channel Embeddings</strong>
397
- <math xmlns="http://www.w3.org/1998/Math/MathML">
398
- <msub>
399
- <mi>ς</mi>
400
- <mi>i</mi>
401
- </msub>
402
- </math> <a class="citation-reference" href="#eegpt" id="citation-reference-7" role="doc-biblioref">[eegpt]</a>. Each channel's data is treated as a distinct sequence of tokens tagged
403
- with its spatial identity. This allows the model to flexibly handle different montages and
404
- missing channels by simply mapping channel names to their corresponding learnable embeddings.</p></li>
405
- <li><p><strong>Spectral.</strong>
406
- Spectral information is implicitly learned through the <strong>Mask-based Reconstruction</strong> objective
407
- (<math xmlns="http://www.w3.org/1998/Math/MathML">
408
- <msub>
409
- <mi>ℒ</mi>
410
- <mi>R</mi>
411
- </msub>
412
- </math>) <a class="citation-reference" href="#eegpt" id="citation-reference-8" role="doc-biblioref">[eegpt]</a>. By forcing the model to reconstruct raw waveforms (including phase
413
- and amplitude) from masked inputs, the model learns to encode frequency-specific patterns necessary
414
- refines this by encouraging these spectral features to align with robust, high-level semantic representations.</p></li>
415
- </ul>
416
- <p><strong>Pretrained Weights</strong></p>
417
- <p>Weights are available on <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">HuggingFace</a>.</p>
418
- <aside class="admonition important">
419
- <p class="admonition-title">Important</p>
420
- <p><strong>Pre-trained Weights Available</strong></p>
421
- <p>This model has pre-trained weights available on the Hugging Face Hub.
422
- <a class="reference external" href="https://huggingface.co/braindecode/eegpt-pretrained">Link here</a>.</p>
423
- <p>You can load them using:</p>
424
- <p>To push your own trained model to the Hub:</p>
425
- <p>Requires installing <span class="docutils literal">braindecode[hug]</span> for Hub integration.</p>
426
- </aside>
427
- <p><strong>Usage</strong></p>
428
- <p>The model can be initialized for specific downstream tasks (e.g., classification) by specifying
429
- <cite>n_outputs</cite>, <cite>chs_info</cite>, <cite>n_times</cite>.</p>
430
- <section id="parameters">
431
- <h2>Parameters</h2>
432
- <dl class="simple">
433
- <dt>return_encoder_output<span class="classifier">bool, default=False</span></dt>
434
- <dd><p>Whether to return the encoder output or the classifier output.</p>
435
- </dd>
436
- <dt>patch_size<span class="classifier">int, default=64</span></dt>
437
- <dd><p>Size of the patches for the transformer.</p>
438
- </dd>
439
- <dt>patch_stride<span class="classifier">int, default=32</span></dt>
440
- <dd><p>Stride of the patches for the transformer.</p>
441
- </dd>
442
- <dt>embed_num<span class="classifier">int, default=4</span></dt>
443
- <dd><p>Number of summary tokens used for the global representation.</p>
444
- </dd>
445
- <dt>embed_dim<span class="classifier">int, default=512</span></dt>
446
- <dd><p>Dimension of the embeddings.</p>
447
- </dd>
448
- <dt>depth<span class="classifier">int, default=8</span></dt>
449
- <dd><p>Number of transformer layers.</p>
450
- </dd>
451
- <dt>num_heads<span class="classifier">int, default=8</span></dt>
452
- <dd><p>Number of attention heads.</p>
453
- </dd>
454
- <dt>mlp_ratio<span class="classifier">float, default=4.0</span></dt>
455
- <dd><p>Ratio of the MLP hidden dimension to the embedding dimension.</p>
456
- </dd>
457
- <dt>drop_prob<span class="classifier">float, default=0.0</span></dt>
458
- <dd><p>Dropout probability.</p>
459
- </dd>
460
- <dt>attn_drop_rate<span class="classifier">float, default=0.0</span></dt>
461
- <dd><p>Attention dropout rate.</p>
462
- </dd>
463
- <dt>drop_path_rate<span class="classifier">float, default=0.0</span></dt>
464
- <dd><p>Drop path rate.</p>
465
- </dd>
466
- <dt>init_std<span class="classifier">float, default=0.02</span></dt>
467
- <dd><p>Standard deviation for weight initialization.</p>
468
- </dd>
469
- <dt>qkv_bias<span class="classifier">bool, default=True</span></dt>
470
- <dd><p>Whether to use bias in the QKV projection.</p>
471
- </dd>
472
- <dt>norm_layer<span class="classifier">torch.nn.Module, default=None</span></dt>
473
- <dd><p>Normalization layer. If None, defaults to <span class="docutils literal">nn.LayerNorm</span> with epsilon <span class="docutils literal">layer_norm_eps</span>.</p>
474
- </dd>
475
- <dt>layer_norm_eps<span class="classifier">float, default=1e-6</span></dt>
476
- <dd><p>Epsilon value for the normalization layer.</p>
477
- </dd>
478
- </dl>
479
- </section>
480
- <section id="references">
481
- <h2>References</h2>
482
- <div role="list" class="citation-list">
483
- <div class="citation" id="eegpt" role="doc-biblioentry">
484
- <span class="label"><span class="fn-bracket">[</span>eegpt<span class="fn-bracket">]</span></span>
485
- <span class="backrefs">(<a role="doc-backlink" href="#citation-reference-1">1</a>,<a role="doc-backlink" href="#citation-reference-2">2</a>,<a role="doc-backlink" href="#citation-reference-3">3</a>,<a role="doc-backlink" href="#citation-reference-4">4</a>,<a role="doc-backlink" href="#citation-reference-5">5</a>,<a role="doc-backlink" href="#citation-reference-6">6</a>,<a role="doc-backlink" href="#citation-reference-7">7</a>,<a role="doc-backlink" href="#citation-reference-8">8</a>)</span>
486
- <p>Wang, G., Liu, W., He, Y., Xu, C., Ma, L., &amp; Li, H. (2024).
487
- EEGPT: Pretrained transformer for universal and reliable representation of eeg signals.
488
- Advances in Neural Information Processing Systems, 37, 39249-39280.
489
- Online: <a class="reference external" href="https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf">https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf</a></p>
490
- </div>
491
- </div>
492
- </section>
493
- <section id="notes">
494
- <h2>Notes</h2>
495
- <p>When loading pretrained weights from the original EEGPT checkpoint (e.g., for
496
- fine-tuning), you may encounter &quot;unexpected keys&quot; related to the <cite>predictor</cite>
497
- and <cite>reconstructor</cite> modules (e.g., <cite>predictor.mask_token</cite>, <cite>reconstructor.time_embed</cite>).
498
- These components are used only during the self-supervised pre-training phase
499
- (Masked Auto-Encoder) and are not part of this encoder-only model used for
500
- downstream tasks. It is safe to ignore them.</p>
501
- <p><strong>Hugging Face Hub integration</strong></p>
502
- <p>When the optional <span class="docutils literal">huggingface_hub</span> package is installed, all models
503
- automatically gain the ability to be pushed to and loaded from the
504
- Hugging Face Hub. Install with:</p>
505
- <pre class="literal-block">pip install braindecode[hub]</pre>
506
- <p><strong>Pushing a model to the Hub:</strong></p>
507
- <p><strong>Loading a model from the Hub:</strong></p>
508
- <p><strong>Extracting features and replacing the head:</strong></p>
509
- <p><strong>Saving and restoring full configuration:</strong></p>
510
- <p>All model parameters (both EEG-specific and model-specific such as
511
- dropout rates, activation functions, number of filters) are automatically
512
- saved to the Hub and restored when loading.</p>
513
- <p>See :ref:`load-pretrained-models` for a complete tutorial.</p>
514
- </section>
515
- </main>
516
- </div>
517
 
518
  ## Citation
519
 
520
- Please cite both the original paper for this architecture (see the
521
- *References* section above) and braindecode:
522
 
523
  ```bibtex
524
  @article{aristimunha2025braindecode,
 
14
 
15
  # EEGPT
16
 
17
+ EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals from Wang et al. (2024) [eegpt].
18
 
19
+ > **Architecture-only repository.** Documents the
20
  > `braindecode.models.EEGPT` class. **No pretrained weights are
21
+ > distributed here.** Instantiate the model and train it on your own
22
+ > data.
 
23
 
24
  ## Quick start
25
 
 
38
  )
39
  ```
40
 
41
+ The signal-shape arguments above are illustrative defaults — adjust to
42
+ match your recording.
43
 
44
  ## Documentation
45
+ - Full API reference: <https://braindecode.org/stable/generated/braindecode.models.EEGPT.html>
46
+ - Interactive browser (live instantiation, parameter counts):
 
 
47
  <https://huggingface.co/spaces/braindecode/model-explorer>
48
  - Source on GitHub: <https://github.com/braindecode/braindecode/blob/master/braindecode/models/eegpt.py#L21>
49
 
 
50
 
51
+ ## Architecture
52
+
53
+ ![EEGPT architecture](https://github.com/BINE022/EEGPT/raw/main/figures/EEGPT.jpg)
54
+
55
+
56
+ ## Parameters
57
+
58
+ | Parameter | Type | Description |
59
+ |---|---|---|
60
+ | `return_encoder_output` | bool, default=False | Whether to return the encoder output or the classifier output. |
61
+ | `patch_size` | int, default=64 | Size of the patches for the transformer. |
62
+ | `patch_stride` | int, default=32 | Stride of the patches for the transformer. |
63
+ | `embed_num` | int, default=4 | Number of summary tokens used for the global representation. |
64
+ | `embed_dim` | int, default=512 | Dimension of the embeddings. |
65
+ | `depth` | int, default=8 | Number of transformer layers. |
66
+ | `num_heads` | int, default=8 | Number of attention heads. |
67
+ | `mlp_ratio` | float, default=4.0 | Ratio of the MLP hidden dimension to the embedding dimension. |
68
+ | `drop_prob` | float, default=0.0 | Dropout probability. |
69
+ | `attn_drop_rate` | float, default=0.0 | Attention dropout rate. |
70
+ | `drop_path_rate` | float, default=0.0 | Drop path rate. |
71
+ | `init_std` | float, default=0.02 | Standard deviation for weight initialization. |
72
+ | `qkv_bias` | bool, default=True | Whether to use bias in the QKV projection. |
73
+ | `norm_layer` | torch.nn.Module, default=None | Normalization layer. If None, defaults to `nn.LayerNorm` with epsilon `layer_norm_eps`. |
74
+ | `layer_norm_eps` | float, default=1e-6 | Epsilon value for the normalization layer. |
75
+
76
+
77
+ ## References
78
+
79
+ 1. Wang, G., Liu, W., He, Y., Xu, C., Ma, L., & Li, H. (2024). EEGPT: Pretrained transformer for universal and reliable representation of eeg signals. Advances in Neural Information Processing Systems, 37, 39249-39280. Online: https://proceedings.neurips.cc/paper_files/paper/2024/file/4540d267eeec4e5dbd9dae9448f0b739-Paper-Conference.pdf
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
  ## Citation
83
 
84
+ Cite the original architecture paper (see *References* above) and braindecode:
 
85
 
86
  ```bibtex
87
  @article{aristimunha2025braindecode,