File size: 12,156 Bytes
31e2456 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | # PhysioJEPA architecture landscape
*Oz Labs — April 2026*
*Revision 2: post-reviewer critique. Replaces cardio_jepa_architectures.md*
---
## Change log from revision 1
- All "CausalCardio-JEPA" → "PhysioJEPA" (Architecture F)
- v1 architecture clarified: raw PPG patches, EMA only, no morphological encoding, no SIGReg, no cardiac phase encoding in first run
- Ablation structure added to Architecture F entry
- Execution order updated to cross-reference experiment matrix
- Architecture descriptions retain full detail; only framing corrected
---
## Prior work — precisely characterised
### Weimann & Conrad — ECG-JEPA (2410.13867)
The direct baseline. I-JEPA adapted for 12-lead ECG: 2D patch tokenisation (leads × time), multi-block contiguous masking, EMA target encoder, L1 latent prediction. Pretrained on 1M+ records, AUC 0.945 on PTB-XL all-statements. Open source at `github.com/kweimann/ECG-JEPA` — our starting codebase. **Limitation**: unimodal, no PPG, no temporal dynamics beyond the 10s window. Static representation learner, not a world model.
### Kim — CroPA-ECG-JEPA (2410.08559)
Introduces Cross-Pattern Attention (CroPA): masked attention enforcing inter-lead dependencies. Recovers HR and QRS duration from frozen representations. **Lesson**: clinical inductive bias (inter-lead relationships) improves cardiac JEPA. Directly motivates cardiac phase encoding in Architecture F ablation A2.
### Khadka et al. — EEG-VJEPA (2507.03633)
Treats multi-channel EEG as 3D spatiotemporal tensor, applies V-JEPA tube masking. 85.8% accuracy on TUH Abnormal EEG, UMAP clusters showing pathological separation. **Lesson**: V-JEPA tube masking transfers to physiological signals; the "signal as video" reframe works.
### Zhou et al. — Brain-JEPA (NeurIPS 2024)
Brain Gradient Positioning as domain-specific positional encoding derived from fMRI connectivity gradients. **Lesson for us**: cardiac phase encoding (P/QRS/ST/T) is the cardiac analog. Botman reviewer raised the valid concern that hard phase boundaries fail during AF — soft Gaussian encoding over landmarks is the fix if we pursue ablation A2.
### Wang et al. — EchoJEPA (2602.02603)
V-JEPA 2 on 18M echocardiograms; JEPA degrades only 2% under physics-informed acoustic perturbations vs 17% for VideoMAE. **Key lesson**: JEPA's noise rejection is the primary advantage for medical signals. Directly motivates our choice of JEPA over MAE for ICU PPG data.
### Balestriero & LeCun — LeJEPA (2511.08544)
Proves isotropic Gaussian is optimal JEPA embedding; introduces SIGReg (Sketched Isotropic Gaussian Regularisation) with linear complexity. Eliminates EMA entirely. **Position in our work**: ablation A3 tests SIGReg vs EMA on cardiac signals. Botman/Laya raised the concern that SIGReg may over-regularise impulsive ECG transients (QRS complexes) — mitigation is applying SIGReg only to pooled global representation, not per-patch tokens.
### Botman et al. — Laya (2603.16281)
First LeJEPA application to EEG at scale; latent prediction outperforms reconstruction on clinical tasks. Found that SIGReg with aggressive λ causes instability on signals with large amplitude transients — recommended lower λ and gradient clipping. **Direct prior** for our ablation A3.
### Nie et al. — AnyPPG (2511.01747)
Symmetric InfoNCE on 100k+ hours of synchronized ECG-PPG. Critical detail: the ECGFounder encoder is *frozen* during AnyPPG training — ECG acts as a fixed supervisory signal, not a jointly-learned representation. R@1=0.736 for PPG→ECG retrieval. **This is the primary baseline we differentiate from.** AnyPPG answers "what is the cardiovascular state now?" — PhysioJEPA asks "how does it evolve?"
### Assran et al. — V-JEPA 2-AC (2506.09985)
Two-stage: action-free pretraining then action-conditioned post-training on 62 hours of robot trajectories. Zero-shot robot planning via MPC in latent space. **Architectural template for Architecture D** (intervention-conditioned, future work). Not part of PhysioJEPA v1.
### Wu, Lei et al. — SurgMotion (2602.05638)
V-JEPA 2 on surgical video, rejects smoke/specular artifacts. **Lesson**: the JEPA noise-rejection property generalises across medical imaging modalities.
---
## Five architectures + two novel extensions
### Architecture A — Temporal ECG-JEPA
**What it is**: ECG-JEPA (Weimann) extended from spatial masking to temporal future prediction. Context window t→t+T, target is t+T→t+2T. Single modality, no PPG.
**Use case**: Baseline A in the experiment matrix. Also the fallback if K2 fails — this is still publishable as an extension of Weimann & Conrad.
**Novelty**: low. The masking axis change is surgical. The temporal rollout evaluation (AF onset prediction via latent trajectory deviation score) is new but not a strong paper claim.
**Estimated performance**: should match or slightly exceed ECG-JEPA on static tasks; advantage only on temporal tasks.
---
### Architecture B — Symmetric cross-modal JEPA (Δt=0)
**What it is**: Dual encoder (ECG + PPG), cross-attention predictor, but Δt is fixed to 0. ECG context predicts PPG at the same time.
**Use case**: Baseline B in the experiment matrix — the controlled comparison that isolates whether Δt matters.
**Novelty**: low. This is essentially JEPA-flavoured AnyPPG without the frozen encoder constraint.
**Why it exists**: without this baseline, K2 cannot be answered. It must run in parallel with PhysioJEPA from Day 4.
---
### Architecture C — LeJEPA cardiac (SIGReg, cross-modal)
**What it is**: Architecture PhysioJEPA but replacing EMA with SIGReg. Clean theoretical foundation from Balestriero & LeCun.
**When to run**: ablation A3 after E3 passes K2.
**Key risk**: SIGReg enforces isotropic Gaussian geometry globally. ECG signals have highly anisotropic spectral structure (dominant QRS transients). Mitigation: apply SIGReg only to pooled global representations, not per-patch tokens. λ sweep: [0.001, 0.01, 0.05, 0.1].
**What it contributes**: if SIGReg outperforms EMA, it provides a cleaner theoretical story and removes the τ schedule hyperparameter. If it doesn't, EMA stays and LeJEPA is cited as related work.
---
### Architecture D — Intervention-conditioned cardiac world model (V-JEPA 2-AC for ICU)
**What it is**: PhysioJEPA as Stage 1 pretraining; freeze encoder; Stage 2 post-trains action-conditioned predictor on clinical intervention tokens from MIMIC-IV (vasopressors, fluid boluses, ventilator changes).
**When to run**: future work. Requires MIMIC-IV waveform→medication timestamp alignment, which is a separate data engineering project. Not part of the 15-day experiment matrix.
**Why it matters**: this is the highest-impact clinical application. A world model that simulates "what happens to haemodynamics if I give norepinephrine now?" has direct ICU decision support utility. The V-JEPA 2-AC paper demonstrated the two-stage recipe works with only 62 hours of interaction data — we have years of MIMIC-IV ICU data.
**Prerequisite**: PhysioJEPA Stage 1 must work (K2 passes) before investing in Stage 2.
---
### Architecture E — Hierarchical cardiac JEPA (H-JEPA, dual timescale)
**What it is**: Two JEPA levels. Fast encoder (beat-level, ~1s): predicts next-beat ECG/PPG from current beat. Slow encoder (episode-level, 5min): predicts episode summary from sequence of fast latents. Slow predictor conditions fast predictor.
**When to run**: medium-term. Requires significant training complexity management.
**Unique capability**: the only architecture that captures both beat-to-beat variability (HRV) and autonomic tone evolution over minutes. AF onset prediction benefits from both scales.
**Risk**: two-level training can develop gradient imbalance. Curriculum (train fast encoder first, activate slow encoder after convergence) is necessary.
---
### Architecture F — PhysioJEPA (directional asymmetric time-offset JEPA)
**This is the paper.**
**Core innovation**: ECG context predicts PPG morphology at a *variable* time offset Δt, encoding the directional temporal structure of the cardiovascular causal chain (electrical activation → mechanical contraction → peripheral perfusion) that symmetric contrastive methods destroy.
**Why not "causal" JEPA**: the architecture encodes a physiological asymmetry, not causal inference in the interventional sense. Calling it "causal" would invite statistical causality reviewers to reject on framing alone. "Directional" or "asymmetric" is accurate and defensible.
#### v1 architecture (minimal, runs in experiment matrix)
See Section 2 of `RESEARCH_DEVELOPMENT.md` for full specification (revised 2026-04-14 post-E0). In brief:
- ECG encoder (ViT-S, 1D over single lead II @ 250 Hz, 50-sample / 200 ms patches)
- PPG target encoder (ViT-T, raw 25-sample / 200 ms patches @ 125 Hz, EMA)
- Cross-attention predictor conditioned on Δt embedding
- EMA collapse prevention (no SIGReg in v1)
- Loss: L1 cross-modal prediction + 0.3 × L1 ECG self-prediction
- Δt sampling: 60% log-uniform [50ms, 500ms], 40% ground-truth PTT
#### What makes v1 different from Baseline B
One thing only: **Δt > 0 vs Δt = 0**. The experiment matrix is designed so that this single variable is isolated. Everything else (encoder architecture, predictor, loss) is identical between E3 and Baseline B.
#### Ablations (run after K2 passes)
| # | Change | Tests |
|---|--------|-------|
| A1 | Morphological PPG tokens instead of raw patches | Does structured PPG encoding improve latent? |
| A2 | Cardiac phase PE (soft Gaussian over landmarks) | Does phase-aware PE beat standard sinusoidal? |
| A3 | SIGReg instead of EMA | Is SIGReg more stable on impulsive cardiac signals? |
| A4 | PTT regression head in training loop (γ=0.1) | Does supervised PTT improve vascular encoding? |
| A5 | Curriculum Δt (ground-truth first, then random) | Does Δt schedule matter? |
#### PTT as validation, not contribution
The PTT regression probe (E5a in the experiment matrix) tests whether the learned Δt structure is physiologically meaningful — not whether we invented a new way to measure PTT. PTT by peak detection is a 10-line script. The contribution is that a model trained without any PTT labels implicitly encodes PTT in its latent geometry. That is the evidence for claim 3 in the hypothesis.
---
### Architecture G — Variational PhysioJEPA (uncertainty-aware clinical planning)
**What it is**: Extend Architecture D with a variational predictor. Instead of a single future latent, predict μ and σ over future latents. At inference, sample K rollout trajectories and select action sequences that minimise expected goal distance weighted by uncertainty.
**Why it matters clinically**: ICU decisions require not just a prediction but a confidence signal. A world model that signals high σ for a septic patient with unstable haemodynamics tells the clinician that standard vasopressor protocols may not apply.
**Connection to Ha & Schmidhuber (2018)**: G is the modern JEPA equivalent of their VAE + MDN world model — JEPA encoder replaces VAE, variational predictor with Gaussian output replaces MDN, intervention token replaces random action.
**When to pursue**: after Architecture D is validated. This is a two-paper arc: D then G.
---
## Recommended execution order
```
Now (weeks 1–2): Architecture F v1 (PhysioJEPA)
Baselines A, B, C (experiment matrix E2)
Decision gate at K2
Weeks 3–4: Ablations A1–A5 (if K2 passes)
Months 2–3: Architecture D (if MIMIC-IV data join succeeds)
Architecture E (if Zack bandwidth)
Future: Architecture G (after D validated)
Architecture A as ablation/fallback paper
```
The architecture document serves as a reference map. The experiment matrix is the operational guide. Everything that is not in the experiment matrix is future work.
---
*Document revision 2 — April 2026*
*Architecture F is now named PhysioJEPA throughout.*
*Execution order cross-references physiojep_experiment_matrix.md* |