# V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo The brief's V3 clause asks the framework to cover six substrates. This doc maps each to **what we have** + **what we don't** + **why that's the right shape** given the substrate's status and the framework's scope. ## TRL — `huggingface/trl` **Status**: ✅ **Production target for v0.1.** Working code. **What we have**: - Research deep-dive: `research/04-verl-trl.md` § 3 (algorithm coverage: GRPO / DAPO / DPO / PRM, extension points, `_compute_loss` vs `compute_advantages`) - Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe A - Working code: `composer_replication.trainer.ComposerReplicationTrainer` subclasses `GRPOTrainer`, overrides `_compute_loss(model, inputs)` to compose 3 channels (`grpo + α·sdpo + β·trace_replay_dpo`) - Data collator: `composer_replication.trainer.data_collator.ComposerDataCollator` builds the `inputs` dict the trainer expects - DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25 **What we don't**: - A full end-to-end training run (gated on real GPU rollouts + reward calculations — out of scope for CPU-budget deep-work-loop) **Why this shape**: TRL is the most-supported substrate for GRPO post-training. Its `GRPOTrainer.subclass.override._compute_loss` extension point is the cleanest path. Production v0.1 lives here. --- ## VeRL — `volcengine/verl` **Status**: 🟡 **Production target for v0.2 (multi-node scale).** Skeleton, not yet runnable. **What we have**: - Research deep-dive: `research/04-verl-trl.md` § 4 (3D-HybridEngine, resharding pattern, advantage estimator registry) - Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe B - Skeleton code: `spikes/005-integrated-trainer-skeleton/verl_path/` - `composer_adv.py` (110 LOC) — `@register_adv_est("composer_3channel")` decorator - `composer_config.yaml` (89 LOC) — full PPO trainer config with our advantage estimator wired in - DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25 **What we don't**: - A working VeRL run on real hardware (VeRL itself has steep setup; v0.1 prioritizes TRL because it's faster to iterate on) **Why this shape**: VeRL's 3D-HybridEngine and decentralized scheduler are better than TRL's at >32 GPU scale. We build the recipe but don't make it the default. The framework supports either path; users on >8-GPU clusters should use VeRL. --- ## DiLoCo — `meta-pytorch/torchft` **Status**: 🟡 **Outer-loop wrapper integrated.** Multi-replica convergence GPU-gated. **What we have**: - Research deep-dive: `research/02-diloco-family.md` (DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary source links and license/maturity assessment) - ADR: `docs/adrs/ADR-003-diloco-impl.md` — chose `torchft.local_sgd.DiLoCo` (BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives - Working code: `composer_replication.diloco.make_diloco_outer_loop` wrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local). - Spike 008: 5/5 single-process tests. **Sign-convention test** is the single best test in the framework (per cross-model review). - Reconnaissance: `docs/research/DILOCO_RECONNAISSANCE.md` **What we don't**: - True multi-replica convergence test. Single-process post-hook sequencing prevents this (replica A's outer step completes before replica B's allreduce arrives). Real-multi-process test deferred to GPU phase. - Trainer integration. The wrapper is a context manager; wiring it into `ComposerReplicationTrainer.train()` lifecycle is a separate spike. **Why this shape**: DiLoCo's value proposition (decentralized inner training with sparse outer sync) only matters at multi-cluster scale. Our v0.1 target is single-cluster training with TRL. The DiLoCo wrapper is wired up so v0.2 multi-cluster training can switch it on with one config change. --- ## OpenEnv **Status**: 📋 **Reference pattern (substrate, not a choice).** **What we have**: - Research deep-dive: `research/03-monarch-torchforge-openenv.md` § OpenEnv (the env-format standard, how it interacts with TRL's `environment_factory=`) - Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe D — "OpenEnv is a substrate, not a choice" **What we don't**: - Direct OpenEnv code dependency. The framework's data path is OpenEnv-compatible by virtue of using TRL's API, which accepts `environment_factory=` kwargs that OpenEnv environments satisfy. **Why this shape**: OpenEnv is a *protocol* (how an env exposes itself to a trainer), not a library you depend on. You either implement an OpenEnv-compatible environment or you don't. Composer 2.5's "Feature Deletion" environment is OpenEnv-shaped; if a user provides one, our TRL trainer accepts it via `environment_factory=`. --- ## Monarch (Meta) **Status**: 📋 **Reference pattern (alternative coordination model).** **What we have**: - Research deep-dive: `research/03-monarch-torchforge-openenv.md` § Monarch (actor mesh, hardware abstractions, comparison to Ray) - Integration recipe: `docs/INTEGRATION_ARCHITECTURE.md` Recipe C — "TorchForge + Monarch (reference patterns only, not a production target)" **What we don't**: - Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync as our coordination model; Monarch's actor mesh is an alternative. **Why this shape**: Monarch is alive (Meta is shipping it) but it's a *coordination layer*, not an *algorithm*. Our framework integrates with PyTorch + TRL + torchft directly; Monarch would replace the coordination layer underneath. Documented as a future option; not a v0.1 dependency. --- ## TorchForge (Meta, paused) **Status**: 📋 **Reference only (upstream paused).** **What we have**: - Research deep-dive: `research/03-monarch-torchforge-openenv.md` § TorchForge — design lessons captured **What we don't**: - Code dependency. TorchForge as a project was paused by Meta. **Why this shape**: The brief asked us to research TorchForge. We did. The headline finding is "Meta paused this." That's a real research output even if it doesn't translate to code. --- ## Summary | Substrate | Research | Recipe | Code | Tests | v0.1 production? | |---|---|---|---|---|---| | TRL | ✅ | ✅ | ✅ | 38 + 9 + 3 = 50 | ✅ | | VeRL | ✅ | ✅ | 🟡 (skeleton) | — | v0.2 | | **PRIME-RL** (Wave 13) | ✅ | ✅ | 🟡 (loss adapter + config) | — | v0.2 (cleanest hook) | | DiLoCo (single-process) | ✅ | ✅ | ✅ | 5 (single-replica) | optional | | **DiLoCo over serverless** (Wave 13) | ✅ | ✅ ADR-005 | ✅ Local + 🟡 Modal/HFJobs | 9 multi-process | ✅ (local) / future (cloud) | | OpenEnv | ✅ | ✅ | n/a (protocol) | — | substrate | | **Monarch** (Wave 13) | ✅ | ✅ (actor layout) | 🟡 (skeleton) | — | v0.2+ | | TorchForge | ✅ | n/a (paused) | n/a | — | n/a | **8/8 substrates covered** (was 6/6 pre-Wave-13). New since Wave 13: PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters + object-store rendezvous). The framework can now realize Decoupled DiLoCo across cloud executors **without any cross-job NCCL** — see ADR-005 for the design rationale.