composer-replication-framework / docs /V3_SUBSTRATE_COVERAGE.md

Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch

b266c31 12 days ago

7.35 kB

V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo

The brief's V3 clause asks the framework to cover six substrates. This doc maps each to what we have + what we don't + why that's the right shape given the substrate's status and the framework's scope.

TRL — `huggingface/trl`

Status: ✅ Production target for v0.1. Working code.

What we have:

Research deep-dive: research/04-verl-trl.md § 3 (algorithm coverage: GRPO / DAPO / DPO / PRM, extension points, _compute_loss vs compute_advantages)
Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe A
Working code: composer_replication.trainer.ComposerReplicationTrainer subclasses GRPOTrainer, overrides _compute_loss(model, inputs) to compose 3 channels (grpo + α·sdpo + β·trace_replay_dpo)
Data collator: composer_replication.trainer.data_collator.ComposerDataCollator builds the inputs dict the trainer expects
DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25

What we don't:

A full end-to-end training run (gated on real GPU rollouts + reward calculations — out of scope for CPU-budget deep-work-loop)

Why this shape: TRL is the most-supported substrate for GRPO post-training. Its GRPOTrainer.subclass.override._compute_loss extension point is the cleanest path. Production v0.1 lives here.

VeRL — `volcengine/verl`

Status: 🟡 Production target for v0.2 (multi-node scale). Skeleton, not yet runnable.

What we have:

Research deep-dive: research/04-verl-trl.md § 4 (3D-HybridEngine, resharding pattern, advantage estimator registry)
Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe B
Skeleton code: spikes/005-integrated-trainer-skeleton/verl_path/
- composer_adv.py (110 LOC) — @register_adv_est("composer_3channel") decorator
- composer_config.yaml (89 LOC) — full PPO trainer config with our advantage estimator wired in
DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25

What we don't:

A working VeRL run on real hardware (VeRL itself has steep setup; v0.1 prioritizes TRL because it's faster to iterate on)

Why this shape: VeRL's 3D-HybridEngine and decentralized scheduler are better than TRL's at >32 GPU scale. We build the recipe but don't make it the default. The framework supports either path; users on >8-GPU clusters should use VeRL.

DiLoCo — `meta-pytorch/torchft`

Status: 🟡 Outer-loop wrapper integrated. Multi-replica convergence GPU-gated.

What we have:

Research deep-dive: research/02-diloco-family.md (DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary source links and license/maturity assessment)
ADR: docs/adrs/ADR-003-diloco-impl.md — chose torchft.local_sgd.DiLoCo (BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives
Working code: composer_replication.diloco.make_diloco_outer_loop wrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local).
Spike 008: 5/5 single-process tests. Sign-convention test is the single best test in the framework (per cross-model review).
Reconnaissance: docs/research/DILOCO_RECONNAISSANCE.md

What we don't:

True multi-replica convergence test. Single-process post-hook sequencing prevents this (replica A's outer step completes before replica B's allreduce arrives). Real-multi-process test deferred to GPU phase.
Trainer integration. The wrapper is a context manager; wiring it into ComposerReplicationTrainer.train() lifecycle is a separate spike.

Why this shape: DiLoCo's value proposition (decentralized inner training with sparse outer sync) only matters at multi-cluster scale. Our v0.1 target is single-cluster training with TRL. The DiLoCo wrapper is wired up so v0.2 multi-cluster training can switch it on with one config change.

OpenEnv

Status: 📋 Reference pattern (substrate, not a choice).

What we have:

Research deep-dive: research/03-monarch-torchforge-openenv.md § OpenEnv (the env-format standard, how it interacts with TRL's environment_factory=)
Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe D — "OpenEnv is a substrate, not a choice"

What we don't:

Direct OpenEnv code dependency. The framework's data path is OpenEnv-compatible by virtue of using TRL's API, which accepts environment_factory= kwargs that OpenEnv environments satisfy.

Why this shape: OpenEnv is a protocol (how an env exposes itself to a trainer), not a library you depend on. You either implement an OpenEnv-compatible environment or you don't. Composer 2.5's "Feature Deletion" environment is OpenEnv-shaped; if a user provides one, our TRL trainer accepts it via environment_factory=.

Monarch (Meta)

Status: 📋 Reference pattern (alternative coordination model).

What we have:

Research deep-dive: research/03-monarch-torchforge-openenv.md § Monarch (actor mesh, hardware abstractions, comparison to Ray)
Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe C — "TorchForge + Monarch (reference patterns only, not a production target)"

What we don't:

Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync as our coordination model; Monarch's actor mesh is an alternative.

Why this shape: Monarch is alive (Meta is shipping it) but it's a coordination layer, not an algorithm. Our framework integrates with PyTorch + TRL + torchft directly; Monarch would replace the coordination layer underneath. Documented as a future option; not a v0.1 dependency.

TorchForge (Meta, paused)

Status: 📋 Reference only (upstream paused).

What we have:

Research deep-dive: research/03-monarch-torchforge-openenv.md § TorchForge — design lessons captured

What we don't:

Code dependency. TorchForge as a project was paused by Meta.

Why this shape: The brief asked us to research TorchForge. We did. The headline finding is "Meta paused this." That's a real research output even if it doesn't translate to code.

Summary

Substrate	Research	Recipe	Code	Tests	v0.1 production?
TRL	✅	✅	✅	38 + 9 + 3 = 50	✅
VeRL	✅	✅	🟡 (skeleton)	—	v0.2
PRIME-RL (Wave 13)	✅	✅	🟡 (loss adapter + config)	—	v0.2 (cleanest hook)
DiLoCo (single-process)	✅	✅	✅	5 (single-replica)	optional
DiLoCo over serverless (Wave 13)	✅	✅ ADR-005	✅ Local + 🟡 Modal/HFJobs	9 multi-process	✅ (local) / future (cloud)
OpenEnv	✅	✅	n/a (protocol)	—	substrate
Monarch (Wave 13)	✅	✅ (actor layout)	🟡 (skeleton)	—	v0.2+
TorchForge	✅	n/a (paused)	n/a	—	n/a

8/8 substrates covered (was 6/6 pre-Wave-13). New since Wave 13: PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters

object-store rendezvous). The framework can now realize Decoupled DiLoCo across cloud executors without any cross-job NCCL — see ADR-005 for the design rationale.

V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo

TRL — huggingface/trl

VeRL — volcengine/verl

DiLoCo — meta-pytorch/torchft

OpenEnv

Monarch (Meta)

TorchForge (Meta, paused)

Summary

TRL — `huggingface/trl`

VeRL — `volcengine/verl`

DiLoCo — `meta-pytorch/torchft`