File size: 7,351 Bytes

# V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo

The brief's V3 clause asks the framework to cover six substrates. This doc
maps each to **what we have** + **what we don't** + **why that's the right
shape** given the substrate's status and the framework's scope.

## TRL — `huggingface/trl`

**Status**: ✅ **Production target for v0.1.** Working code.

**What we have**:
- Research deep-dive: `research/04-verl-trl.md` § 3 (algorithm coverage:
  GRPO / DAPO / DPO / PRM, extension points, `_compute_loss` vs `compute_advantages`)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe A
- Working code: `composer_replication.trainer.ComposerReplicationTrainer`
  subclasses `GRPOTrainer`, overrides `_compute_loss(model, inputs)` to
  compose 3 channels (`grpo + α·sdpo + β·trace_replay_dpo`)
- Data collator: `composer_replication.trainer.data_collator.ComposerDataCollator`
  builds the `inputs` dict the trainer expects
- DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25

**What we don't**:
- A full end-to-end training run (gated on real GPU rollouts +
  reward calculations — out of scope for CPU-budget deep-work-loop)

**Why this shape**: TRL is the most-supported substrate for GRPO post-training.
Its `GRPOTrainer.subclass.override._compute_loss` extension point is the
cleanest path. Production v0.1 lives here.

---

## VeRL — `volcengine/verl`

**Status**: 🟡 **Production target for v0.2 (multi-node scale).** Skeleton, not yet runnable.

**What we have**:
- Research deep-dive: `research/04-verl-trl.md` § 4 (3D-HybridEngine,
  resharding pattern, advantage estimator registry)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe B
- Skeleton code: `spikes/005-integrated-trainer-skeleton/verl_path/`
  - `composer_adv.py` (110 LOC) — `@register_adv_est("composer_3channel")` decorator
  - `composer_config.yaml` (89 LOC) — full PPO trainer config with our advantage estimator wired in
- DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25

**What we don't**:
- A working VeRL run on real hardware (VeRL itself has steep setup;
  v0.1 prioritizes TRL because it's faster to iterate on)

**Why this shape**: VeRL's 3D-HybridEngine and decentralized scheduler are
better than TRL's at >32 GPU scale. We build the recipe but don't make it
the default. The framework supports either path; users on >8-GPU clusters
should use VeRL.

---

## DiLoCo — `meta-pytorch/torchft`

**Status**: 🟡 **Outer-loop wrapper integrated.** Multi-replica convergence GPU-gated.

**What we have**:
- Research deep-dive: `research/02-diloco-family.md` (DiLoCo / OpenDiLoCo /
  Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary
  source links and license/maturity assessment)
- ADR: `docs/adrs/ADR-003-diloco-impl.md` — chose `torchft.local_sgd.DiLoCo`
  (BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives
- Working code: `composer_replication.diloco.make_diloco_outer_loop`
  wrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local).
- Spike 008: 5/5 single-process tests. **Sign-convention test** is the
  single best test in the framework (per cross-model review).
- Reconnaissance: `docs/research/DILOCO_RECONNAISSANCE.md`

**What we don't**:
- True multi-replica convergence test. Single-process post-hook
  sequencing prevents this (replica A's outer step completes before
  replica B's allreduce arrives). Real-multi-process test deferred to
  GPU phase.
- Trainer integration. The wrapper is a context manager; wiring it into
  `ComposerReplicationTrainer.train()` lifecycle is a separate spike.

**Why this shape**: DiLoCo's value proposition (decentralized inner training
with sparse outer sync) only matters at multi-cluster scale. Our v0.1
target is single-cluster training with TRL. The DiLoCo wrapper is wired
up so v0.2 multi-cluster training can switch it on with one config change.

---

## OpenEnv

**Status**: 📋 **Reference pattern (substrate, not a choice).**

**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § OpenEnv
  (the env-format standard, how it interacts with TRL's `environment_factory=`)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe D —
  "OpenEnv is a substrate, not a choice"

**What we don't**:
- Direct OpenEnv code dependency. The framework's data path is
  OpenEnv-compatible by virtue of using TRL's API, which accepts
  `environment_factory=` kwargs that OpenEnv environments satisfy.

**Why this shape**: OpenEnv is a *protocol* (how an env exposes itself
to a trainer), not a library you depend on. You either implement an
OpenEnv-compatible environment or you don't. Composer 2.5's "Feature
Deletion" environment is OpenEnv-shaped; if a user provides one, our
TRL trainer accepts it via `environment_factory=`.

---

## Monarch (Meta)

**Status**: 📋 **Reference pattern (alternative coordination model).**

**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § Monarch
  (actor mesh, hardware abstractions, comparison to Ray)
- Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe C —
  "TorchForge + Monarch (reference patterns only, not a production target)"

**What we don't**:
- Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync
  as our coordination model; Monarch's actor mesh is an alternative.

**Why this shape**: Monarch is alive (Meta is shipping it) but it's a
*coordination layer*, not an *algorithm*. Our framework integrates with
PyTorch + TRL + torchft directly; Monarch would replace the coordination
layer underneath. Documented as a future option; not a v0.1 dependency.

---

## TorchForge (Meta, paused)

**Status**: 📋 **Reference only (upstream paused).**

**What we have**:
- Research deep-dive: `research/03-monarch-torchforge-openenv.md` § TorchForge
  — design lessons captured

**What we don't**:
- Code dependency. TorchForge as a project was paused by Meta.

**Why this shape**: The brief asked us to research TorchForge. We did.
The headline finding is "Meta paused this." That's a real research output
even if it doesn't translate to code.

---

## Summary

| Substrate | Research | Recipe | Code | Tests | v0.1 production? |
|---|---|---|---|---|---|
| TRL | ✅ | ✅ | ✅ | 38 + 9 + 3 = 50 | ✅ |
| VeRL | ✅ | ✅ | 🟡 (skeleton) | — | v0.2 |
| **PRIME-RL** (Wave 13) | ✅ | ✅ | 🟡 (loss adapter + config) | — | v0.2 (cleanest hook) |
| DiLoCo (single-process) | ✅ | ✅ | ✅ | 5 (single-replica) | optional |
| **DiLoCo over serverless** (Wave 13) | ✅ | ✅ ADR-005 | ✅ Local + 🟡 Modal/HFJobs | 9 multi-process | ✅ (local) / future (cloud) |
| OpenEnv | ✅ | ✅ | n/a (protocol) | — | substrate |
| **Monarch** (Wave 13) | ✅ | ✅ (actor layout) | 🟡 (skeleton) | — | v0.2+ |
| TorchForge | ✅ | n/a (paused) | n/a | — | n/a |

**8/8 substrates covered** (was 6/6 pre-Wave-13). New since Wave 13:
PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped
agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters
+ object-store rendezvous). The framework can now realize Decoupled
DiLoCo across cloud executors **without any cross-job NCCL** — see
ADR-005 for the design rationale.