composer-replication-framework / docs /V3_SUBSTRATE_COVERAGE.md
Codeseys's picture
Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch
b266c31

V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo

The brief's V3 clause asks the framework to cover six substrates. This doc maps each to what we have + what we don't + why that's the right shape given the substrate's status and the framework's scope.

TRL — huggingface/trl

Status: ✅ Production target for v0.1. Working code.

What we have:

  • Research deep-dive: research/04-verl-trl.md § 3 (algorithm coverage: GRPO / DAPO / DPO / PRM, extension points, _compute_loss vs compute_advantages)
  • Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe A
  • Working code: composer_replication.trainer.ComposerReplicationTrainer subclasses GRPOTrainer, overrides _compute_loss(model, inputs) to compose 3 channels (grpo + α·sdpo + β·trace_replay_dpo)
  • Data collator: composer_replication.trainer.data_collator.ComposerDataCollator builds the inputs dict the trainer expects
  • DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25

What we don't:

  • A full end-to-end training run (gated on real GPU rollouts + reward calculations — out of scope for CPU-budget deep-work-loop)

Why this shape: TRL is the most-supported substrate for GRPO post-training. Its GRPOTrainer.subclass.override._compute_loss extension point is the cleanest path. Production v0.1 lives here.


VeRL — volcengine/verl

Status: 🟡 Production target for v0.2 (multi-node scale). Skeleton, not yet runnable.

What we have:

  • Research deep-dive: research/04-verl-trl.md § 4 (3D-HybridEngine, resharding pattern, advantage estimator registry)
  • Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe B
  • Skeleton code: spikes/005-integrated-trainer-skeleton/verl_path/
    • composer_adv.py (110 LOC) — @register_adv_est("composer_3channel") decorator
    • composer_config.yaml (89 LOC) — full PPO trainer config with our advantage estimator wired in
  • DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25

What we don't:

  • A working VeRL run on real hardware (VeRL itself has steep setup; v0.1 prioritizes TRL because it's faster to iterate on)

Why this shape: VeRL's 3D-HybridEngine and decentralized scheduler are better than TRL's at >32 GPU scale. We build the recipe but don't make it the default. The framework supports either path; users on >8-GPU clusters should use VeRL.


DiLoCo — meta-pytorch/torchft

Status: 🟡 Outer-loop wrapper integrated. Multi-replica convergence GPU-gated.

What we have:

  • Research deep-dive: research/02-diloco-family.md (DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary source links and license/maturity assessment)
  • ADR: docs/adrs/ADR-003-diloco-impl.md — chose torchft.local_sgd.DiLoCo (BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives
  • Working code: composer_replication.diloco.make_diloco_outer_loop wrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local).
  • Spike 008: 5/5 single-process tests. Sign-convention test is the single best test in the framework (per cross-model review).
  • Reconnaissance: docs/research/DILOCO_RECONNAISSANCE.md

What we don't:

  • True multi-replica convergence test. Single-process post-hook sequencing prevents this (replica A's outer step completes before replica B's allreduce arrives). Real-multi-process test deferred to GPU phase.
  • Trainer integration. The wrapper is a context manager; wiring it into ComposerReplicationTrainer.train() lifecycle is a separate spike.

Why this shape: DiLoCo's value proposition (decentralized inner training with sparse outer sync) only matters at multi-cluster scale. Our v0.1 target is single-cluster training with TRL. The DiLoCo wrapper is wired up so v0.2 multi-cluster training can switch it on with one config change.


OpenEnv

Status: 📋 Reference pattern (substrate, not a choice).

What we have:

  • Research deep-dive: research/03-monarch-torchforge-openenv.md § OpenEnv (the env-format standard, how it interacts with TRL's environment_factory=)
  • Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe D — "OpenEnv is a substrate, not a choice"

What we don't:

  • Direct OpenEnv code dependency. The framework's data path is OpenEnv-compatible by virtue of using TRL's API, which accepts environment_factory= kwargs that OpenEnv environments satisfy.

Why this shape: OpenEnv is a protocol (how an env exposes itself to a trainer), not a library you depend on. You either implement an OpenEnv-compatible environment or you don't. Composer 2.5's "Feature Deletion" environment is OpenEnv-shaped; if a user provides one, our TRL trainer accepts it via environment_factory=.


Monarch (Meta)

Status: 📋 Reference pattern (alternative coordination model).

What we have:

  • Research deep-dive: research/03-monarch-torchforge-openenv.md § Monarch (actor mesh, hardware abstractions, comparison to Ray)
  • Integration recipe: docs/INTEGRATION_ARCHITECTURE.md Recipe C — "TorchForge + Monarch (reference patterns only, not a production target)"

What we don't:

  • Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync as our coordination model; Monarch's actor mesh is an alternative.

Why this shape: Monarch is alive (Meta is shipping it) but it's a coordination layer, not an algorithm. Our framework integrates with PyTorch + TRL + torchft directly; Monarch would replace the coordination layer underneath. Documented as a future option; not a v0.1 dependency.


TorchForge (Meta, paused)

Status: 📋 Reference only (upstream paused).

What we have:

  • Research deep-dive: research/03-monarch-torchforge-openenv.md § TorchForge — design lessons captured

What we don't:

  • Code dependency. TorchForge as a project was paused by Meta.

Why this shape: The brief asked us to research TorchForge. We did. The headline finding is "Meta paused this." That's a real research output even if it doesn't translate to code.


Summary

Substrate Research Recipe Code Tests v0.1 production?
TRL 38 + 9 + 3 = 50
VeRL 🟡 (skeleton) v0.2
PRIME-RL (Wave 13) 🟡 (loss adapter + config) v0.2 (cleanest hook)
DiLoCo (single-process) 5 (single-replica) optional
DiLoCo over serverless (Wave 13) ✅ ADR-005 ✅ Local + 🟡 Modal/HFJobs 9 multi-process ✅ (local) / future (cloud)
OpenEnv n/a (protocol) substrate
Monarch (Wave 13) ✅ (actor layout) 🟡 (skeleton) v0.2+
TorchForge n/a (paused) n/a n/a

8/8 substrates covered (was 6/6 pre-Wave-13). New since Wave 13: PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters

  • object-store rendezvous). The framework can now realize Decoupled DiLoCo across cloud executors without any cross-job NCCL — see ADR-005 for the design rationale.