composer-replication-framework / docs /V3_SUBSTRATE_COVERAGE.md
Codeseys's picture
Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch
b266c31
# V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo
The brief's V3 clause asks the framework to cover six substrates. This doc
maps each to **what we have** + **what we don't** + **why that's the right
shape** given the substrate's status and the framework's scope.
## TRL — `huggingface/trl`
**Status**: ✅ **Production target for v0.1.** Working code.
**What we have**:
- Research deep-dive: `research/04-verl-trl.md` § 3 (algorithm coverage:
GRPO / DAPO / DPO / PRM, extension points, `_compute_loss` vs `compute_advantages`)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe A
- Working code: `composer_replication.trainer.ComposerReplicationTrainer`
subclasses `GRPOTrainer`, overrides `_compute_loss(model, inputs)` to
compose 3 channels (`grpo + α·sdpo + β·trace_replay_dpo`)
- Data collator: `composer_replication.trainer.data_collator.ComposerDataCollator`
builds the `inputs` dict the trainer expects
- DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25
**What we don't**:
- A full end-to-end training run (gated on real GPU rollouts +
reward calculations — out of scope for CPU-budget deep-work-loop)
**Why this shape**: TRL is the most-supported substrate for GRPO post-training.
Its `GRPOTrainer.subclass.override._compute_loss` extension point is the
cleanest path. Production v0.1 lives here.
---
## VeRL — `volcengine/verl`
**Status**: 🟡 **Production target for v0.2 (multi-node scale).** Skeleton, not yet runnable.
**What we have**:
- Research deep-dive: `research/04-verl-trl.md` § 4 (3D-HybridEngine,
resharding pattern, advantage estimator registry)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe B
- Skeleton code: `spikes/005-integrated-trainer-skeleton/verl_path/`
- `composer_adv.py` (110 LOC) — `@register_adv_est("composer_3channel")` decorator
- `composer_config.yaml` (89 LOC) — full PPO trainer config with our advantage estimator wired in
- DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25
**What we don't**:
- A working VeRL run on real hardware (VeRL itself has steep setup;
v0.1 prioritizes TRL because it's faster to iterate on)
**Why this shape**: VeRL's 3D-HybridEngine and decentralized scheduler are
better than TRL's at >32 GPU scale. We build the recipe but don't make it
the default. The framework supports either path; users on >8-GPU clusters
should use VeRL.
---
## DiLoCo — `meta-pytorch/torchft`
**Status**: 🟡 **Outer-loop wrapper integrated.** Multi-replica convergence GPU-gated.
**What we have**:
- Research deep-dive: `research/02-diloco-family.md` (DiLoCo / OpenDiLoCo /
Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary
source links and license/maturity assessment)
- ADR: `docs/adrs/ADR-003-diloco-impl.md` — chose `torchft.local_sgd.DiLoCo`
(BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives
- Working code: `composer_replication.diloco.make_diloco_outer_loop`
wrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local).
- Spike 008: 5/5 single-process tests. **Sign-convention test** is the
single best test in the framework (per cross-model review).
- Reconnaissance: `docs/research/DILOCO_RECONNAISSANCE.md`
**What we don't**:
- True multi-replica convergence test. Single-process post-hook
sequencing prevents this (replica A's outer step completes before
replica B's allreduce arrives). Real-multi-process test deferred to
GPU phase.
- Trainer integration. The wrapper is a context manager; wiring it into
`ComposerReplicationTrainer.train()` lifecycle is a separate spike.
**Why this shape**: DiLoCo's value proposition (decentralized inner training
with sparse outer sync) only matters at multi-cluster scale. Our v0.1
target is single-cluster training with TRL. The DiLoCo wrapper is wired
up so v0.2 multi-cluster training can switch it on with one config change.
---
## OpenEnv
**Status**: 📋 **Reference pattern (substrate, not a choice).**
**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § OpenEnv
(the env-format standard, how it interacts with TRL's `environment_factory=`)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe D —
"OpenEnv is a substrate, not a choice"
**What we don't**:
- Direct OpenEnv code dependency. The framework's data path is
OpenEnv-compatible by virtue of using TRL's API, which accepts
`environment_factory=` kwargs that OpenEnv environments satisfy.
**Why this shape**: OpenEnv is a *protocol* (how an env exposes itself
to a trainer), not a library you depend on. You either implement an
OpenEnv-compatible environment or you don't. Composer 2.5's "Feature
Deletion" environment is OpenEnv-shaped; if a user provides one, our
TRL trainer accepts it via `environment_factory=`.
---
## Monarch (Meta)
**Status**: 📋 **Reference pattern (alternative coordination model).**
**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § Monarch
(actor mesh, hardware abstractions, comparison to Ray)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe C —
"TorchForge + Monarch (reference patterns only, not a production target)"
**What we don't**:
- Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync
as our coordination model; Monarch's actor mesh is an alternative.
**Why this shape**: Monarch is alive (Meta is shipping it) but it's a
*coordination layer*, not an *algorithm*. Our framework integrates with
PyTorch + TRL + torchft directly; Monarch would replace the coordination
layer underneath. Documented as a future option; not a v0.1 dependency.
---
## TorchForge (Meta, paused)
**Status**: 📋 **Reference only (upstream paused).**
**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § TorchForge
— design lessons captured
**What we don't**:
- Code dependency. TorchForge as a project was paused by Meta.
**Why this shape**: The brief asked us to research TorchForge. We did.
The headline finding is "Meta paused this." That's a real research output
even if it doesn't translate to code.
---
## Summary
| Substrate | Research | Recipe | Code | Tests | v0.1 production? |
|---|---|---|---|---|---|
| TRL | ✅ | ✅ | ✅ | 38 + 9 + 3 = 50 | ✅ |
| VeRL | ✅ | ✅ | 🟡 (skeleton) | — | v0.2 |
| **PRIME-RL** (Wave 13) | ✅ | ✅ | 🟡 (loss adapter + config) | — | v0.2 (cleanest hook) |
| DiLoCo (single-process) | ✅ | ✅ | ✅ | 5 (single-replica) | optional |
| **DiLoCo over serverless** (Wave 13) | ✅ | ✅ ADR-005 | ✅ Local + 🟡 Modal/HFJobs | 9 multi-process | ✅ (local) / future (cloud) |
| OpenEnv | ✅ | ✅ | n/a (protocol) | — | substrate |
| **Monarch** (Wave 13) | ✅ | ✅ (actor layout) | 🟡 (skeleton) | — | v0.2+ |
| TorchForge | ✅ | n/a (paused) | n/a | — | n/a |
**8/8 substrates covered** (was 6/6 pre-Wave-13). New since Wave 13:
PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped
agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters
+ object-store rendezvous). The framework can now realize Decoupled
DiLoCo across cloud executors **without any cross-job NCCL** — see
ADR-005 for the design rationale.