PhysioJEPA / docs /ARCHITECTURES_EXPLORATION.md
guychuk's picture
Upload folder using huggingface_hub
31e2456 verified
# PhysioJEPA architecture landscape
*Oz Labs — April 2026*
*Revision 2: post-reviewer critique. Replaces cardio_jepa_architectures.md*
---
## Change log from revision 1
- All "CausalCardio-JEPA" → "PhysioJEPA" (Architecture F)
- v1 architecture clarified: raw PPG patches, EMA only, no morphological encoding, no SIGReg, no cardiac phase encoding in first run
- Ablation structure added to Architecture F entry
- Execution order updated to cross-reference experiment matrix
- Architecture descriptions retain full detail; only framing corrected
---
## Prior work — precisely characterised
### Weimann & Conrad — ECG-JEPA (2410.13867)
The direct baseline. I-JEPA adapted for 12-lead ECG: 2D patch tokenisation (leads × time), multi-block contiguous masking, EMA target encoder, L1 latent prediction. Pretrained on 1M+ records, AUC 0.945 on PTB-XL all-statements. Open source at `github.com/kweimann/ECG-JEPA` — our starting codebase. **Limitation**: unimodal, no PPG, no temporal dynamics beyond the 10s window. Static representation learner, not a world model.
### Kim — CroPA-ECG-JEPA (2410.08559)
Introduces Cross-Pattern Attention (CroPA): masked attention enforcing inter-lead dependencies. Recovers HR and QRS duration from frozen representations. **Lesson**: clinical inductive bias (inter-lead relationships) improves cardiac JEPA. Directly motivates cardiac phase encoding in Architecture F ablation A2.
### Khadka et al. — EEG-VJEPA (2507.03633)
Treats multi-channel EEG as 3D spatiotemporal tensor, applies V-JEPA tube masking. 85.8% accuracy on TUH Abnormal EEG, UMAP clusters showing pathological separation. **Lesson**: V-JEPA tube masking transfers to physiological signals; the "signal as video" reframe works.
### Zhou et al. — Brain-JEPA (NeurIPS 2024)
Brain Gradient Positioning as domain-specific positional encoding derived from fMRI connectivity gradients. **Lesson for us**: cardiac phase encoding (P/QRS/ST/T) is the cardiac analog. Botman reviewer raised the valid concern that hard phase boundaries fail during AF — soft Gaussian encoding over landmarks is the fix if we pursue ablation A2.
### Wang et al. — EchoJEPA (2602.02603)
V-JEPA 2 on 18M echocardiograms; JEPA degrades only 2% under physics-informed acoustic perturbations vs 17% for VideoMAE. **Key lesson**: JEPA's noise rejection is the primary advantage for medical signals. Directly motivates our choice of JEPA over MAE for ICU PPG data.
### Balestriero & LeCun — LeJEPA (2511.08544)
Proves isotropic Gaussian is optimal JEPA embedding; introduces SIGReg (Sketched Isotropic Gaussian Regularisation) with linear complexity. Eliminates EMA entirely. **Position in our work**: ablation A3 tests SIGReg vs EMA on cardiac signals. Botman/Laya raised the concern that SIGReg may over-regularise impulsive ECG transients (QRS complexes) — mitigation is applying SIGReg only to pooled global representation, not per-patch tokens.
### Botman et al. — Laya (2603.16281)
First LeJEPA application to EEG at scale; latent prediction outperforms reconstruction on clinical tasks. Found that SIGReg with aggressive λ causes instability on signals with large amplitude transients — recommended lower λ and gradient clipping. **Direct prior** for our ablation A3.
### Nie et al. — AnyPPG (2511.01747)
Symmetric InfoNCE on 100k+ hours of synchronized ECG-PPG. Critical detail: the ECGFounder encoder is *frozen* during AnyPPG training — ECG acts as a fixed supervisory signal, not a jointly-learned representation. R@1=0.736 for PPG→ECG retrieval. **This is the primary baseline we differentiate from.** AnyPPG answers "what is the cardiovascular state now?" — PhysioJEPA asks "how does it evolve?"
### Assran et al. — V-JEPA 2-AC (2506.09985)
Two-stage: action-free pretraining then action-conditioned post-training on 62 hours of robot trajectories. Zero-shot robot planning via MPC in latent space. **Architectural template for Architecture D** (intervention-conditioned, future work). Not part of PhysioJEPA v1.
### Wu, Lei et al. — SurgMotion (2602.05638)
V-JEPA 2 on surgical video, rejects smoke/specular artifacts. **Lesson**: the JEPA noise-rejection property generalises across medical imaging modalities.
---
## Five architectures + two novel extensions
### Architecture A — Temporal ECG-JEPA
**What it is**: ECG-JEPA (Weimann) extended from spatial masking to temporal future prediction. Context window t→t+T, target is t+T→t+2T. Single modality, no PPG.
**Use case**: Baseline A in the experiment matrix. Also the fallback if K2 fails — this is still publishable as an extension of Weimann & Conrad.
**Novelty**: low. The masking axis change is surgical. The temporal rollout evaluation (AF onset prediction via latent trajectory deviation score) is new but not a strong paper claim.
**Estimated performance**: should match or slightly exceed ECG-JEPA on static tasks; advantage only on temporal tasks.
---
### Architecture B — Symmetric cross-modal JEPA (Δt=0)
**What it is**: Dual encoder (ECG + PPG), cross-attention predictor, but Δt is fixed to 0. ECG context predicts PPG at the same time.
**Use case**: Baseline B in the experiment matrix — the controlled comparison that isolates whether Δt matters.
**Novelty**: low. This is essentially JEPA-flavoured AnyPPG without the frozen encoder constraint.
**Why it exists**: without this baseline, K2 cannot be answered. It must run in parallel with PhysioJEPA from Day 4.
---
### Architecture C — LeJEPA cardiac (SIGReg, cross-modal)
**What it is**: Architecture PhysioJEPA but replacing EMA with SIGReg. Clean theoretical foundation from Balestriero & LeCun.
**When to run**: ablation A3 after E3 passes K2.
**Key risk**: SIGReg enforces isotropic Gaussian geometry globally. ECG signals have highly anisotropic spectral structure (dominant QRS transients). Mitigation: apply SIGReg only to pooled global representations, not per-patch tokens. λ sweep: [0.001, 0.01, 0.05, 0.1].
**What it contributes**: if SIGReg outperforms EMA, it provides a cleaner theoretical story and removes the τ schedule hyperparameter. If it doesn't, EMA stays and LeJEPA is cited as related work.
---
### Architecture D — Intervention-conditioned cardiac world model (V-JEPA 2-AC for ICU)
**What it is**: PhysioJEPA as Stage 1 pretraining; freeze encoder; Stage 2 post-trains action-conditioned predictor on clinical intervention tokens from MIMIC-IV (vasopressors, fluid boluses, ventilator changes).
**When to run**: future work. Requires MIMIC-IV waveform→medication timestamp alignment, which is a separate data engineering project. Not part of the 15-day experiment matrix.
**Why it matters**: this is the highest-impact clinical application. A world model that simulates "what happens to haemodynamics if I give norepinephrine now?" has direct ICU decision support utility. The V-JEPA 2-AC paper demonstrated the two-stage recipe works with only 62 hours of interaction data — we have years of MIMIC-IV ICU data.
**Prerequisite**: PhysioJEPA Stage 1 must work (K2 passes) before investing in Stage 2.
---
### Architecture E — Hierarchical cardiac JEPA (H-JEPA, dual timescale)
**What it is**: Two JEPA levels. Fast encoder (beat-level, ~1s): predicts next-beat ECG/PPG from current beat. Slow encoder (episode-level, 5min): predicts episode summary from sequence of fast latents. Slow predictor conditions fast predictor.
**When to run**: medium-term. Requires significant training complexity management.
**Unique capability**: the only architecture that captures both beat-to-beat variability (HRV) and autonomic tone evolution over minutes. AF onset prediction benefits from both scales.
**Risk**: two-level training can develop gradient imbalance. Curriculum (train fast encoder first, activate slow encoder after convergence) is necessary.
---
### Architecture F — PhysioJEPA (directional asymmetric time-offset JEPA)
**This is the paper.**
**Core innovation**: ECG context predicts PPG morphology at a *variable* time offset Δt, encoding the directional temporal structure of the cardiovascular causal chain (electrical activation → mechanical contraction → peripheral perfusion) that symmetric contrastive methods destroy.
**Why not "causal" JEPA**: the architecture encodes a physiological asymmetry, not causal inference in the interventional sense. Calling it "causal" would invite statistical causality reviewers to reject on framing alone. "Directional" or "asymmetric" is accurate and defensible.
#### v1 architecture (minimal, runs in experiment matrix)
See Section 2 of `RESEARCH_DEVELOPMENT.md` for full specification (revised 2026-04-14 post-E0). In brief:
- ECG encoder (ViT-S, 1D over single lead II @ 250 Hz, 50-sample / 200 ms patches)
- PPG target encoder (ViT-T, raw 25-sample / 200 ms patches @ 125 Hz, EMA)
- Cross-attention predictor conditioned on Δt embedding
- EMA collapse prevention (no SIGReg in v1)
- Loss: L1 cross-modal prediction + 0.3 × L1 ECG self-prediction
- Δt sampling: 60% log-uniform [50ms, 500ms], 40% ground-truth PTT
#### What makes v1 different from Baseline B
One thing only: **Δt > 0 vs Δt = 0**. The experiment matrix is designed so that this single variable is isolated. Everything else (encoder architecture, predictor, loss) is identical between E3 and Baseline B.
#### Ablations (run after K2 passes)
| # | Change | Tests |
|---|--------|-------|
| A1 | Morphological PPG tokens instead of raw patches | Does structured PPG encoding improve latent? |
| A2 | Cardiac phase PE (soft Gaussian over landmarks) | Does phase-aware PE beat standard sinusoidal? |
| A3 | SIGReg instead of EMA | Is SIGReg more stable on impulsive cardiac signals? |
| A4 | PTT regression head in training loop (γ=0.1) | Does supervised PTT improve vascular encoding? |
| A5 | Curriculum Δt (ground-truth first, then random) | Does Δt schedule matter? |
#### PTT as validation, not contribution
The PTT regression probe (E5a in the experiment matrix) tests whether the learned Δt structure is physiologically meaningful — not whether we invented a new way to measure PTT. PTT by peak detection is a 10-line script. The contribution is that a model trained without any PTT labels implicitly encodes PTT in its latent geometry. That is the evidence for claim 3 in the hypothesis.
---
### Architecture G — Variational PhysioJEPA (uncertainty-aware clinical planning)
**What it is**: Extend Architecture D with a variational predictor. Instead of a single future latent, predict μ and σ over future latents. At inference, sample K rollout trajectories and select action sequences that minimise expected goal distance weighted by uncertainty.
**Why it matters clinically**: ICU decisions require not just a prediction but a confidence signal. A world model that signals high σ for a septic patient with unstable haemodynamics tells the clinician that standard vasopressor protocols may not apply.
**Connection to Ha & Schmidhuber (2018)**: G is the modern JEPA equivalent of their VAE + MDN world model — JEPA encoder replaces VAE, variational predictor with Gaussian output replaces MDN, intervention token replaces random action.
**When to pursue**: after Architecture D is validated. This is a two-paper arc: D then G.
---
## Recommended execution order
```
Now (weeks 1–2): Architecture F v1 (PhysioJEPA)
Baselines A, B, C (experiment matrix E2)
Decision gate at K2
Weeks 3–4: Ablations A1–A5 (if K2 passes)
Months 2–3: Architecture D (if MIMIC-IV data join succeeds)
Architecture E (if Zack bandwidth)
Future: Architecture G (after D validated)
Architecture A as ablation/fallback paper
```
The architecture document serves as a reference map. The experiment matrix is the operational guide. Everything that is not in the experiment matrix is future work.
---
*Document revision 2 — April 2026*
*Architecture F is now named PhysioJEPA throughout.*
*Execution order cross-references physiojep_experiment_matrix.md*