Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
V3 Substrate Coverage — Monarch / TorchForge / OpenEnv / VeRL / TRL / DiLoCo
The brief's V3 clause asks the framework to cover six substrates. This doc maps each to what we have + what we don't + why that's the right shape given the substrate's status and the framework's scope.
TRL — huggingface/trl
Status: ✅ Production target for v0.1. Working code.
What we have:
- Research deep-dive:
research/04-verl-trl.md§ 3 (algorithm coverage: GRPO / DAPO / DPO / PRM, extension points,_compute_lossvscompute_advantages) - Integration recipe:
docs/INTEGRATION_ARCHITECTURE.mdRecipe A - Working code:
composer_replication.trainer.ComposerReplicationTrainersubclassesGRPOTrainer, overrides_compute_loss(model, inputs)to compose 3 channels (grpo + α·sdpo + β·trace_replay_dpo) - Data collator:
composer_replication.trainer.data_collator.ComposerDataCollatorbuilds theinputsdict the trainer expects - DeepWiki audit: extension surface verified against TRL HEAD as of 2026-05-25
What we don't:
- A full end-to-end training run (gated on real GPU rollouts + reward calculations — out of scope for CPU-budget deep-work-loop)
Why this shape: TRL is the most-supported substrate for GRPO post-training.
Its GRPOTrainer.subclass.override._compute_loss extension point is the
cleanest path. Production v0.1 lives here.
VeRL — volcengine/verl
Status: 🟡 Production target for v0.2 (multi-node scale). Skeleton, not yet runnable.
What we have:
- Research deep-dive:
research/04-verl-trl.md§ 4 (3D-HybridEngine, resharding pattern, advantage estimator registry) - Integration recipe:
docs/INTEGRATION_ARCHITECTURE.mdRecipe B - Skeleton code:
spikes/005-integrated-trainer-skeleton/verl_path/composer_adv.py(110 LOC) —@register_adv_est("composer_3channel")decoratorcomposer_config.yaml(89 LOC) — full PPO trainer config with our advantage estimator wired in
- DeepWiki audit: extension surface verified against VeRL HEAD as of 2026-05-25
What we don't:
- A working VeRL run on real hardware (VeRL itself has steep setup; v0.1 prioritizes TRL because it's faster to iterate on)
Why this shape: VeRL's 3D-HybridEngine and decentralized scheduler are better than TRL's at >32 GPU scale. We build the recipe but don't make it the default. The framework supports either path; users on >8-GPU clusters should use VeRL.
DiLoCo — meta-pytorch/torchft
Status: 🟡 Outer-loop wrapper integrated. Multi-replica convergence GPU-gated.
What we have:
- Research deep-dive:
research/02-diloco-family.md(DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 — full audit with primary source links and license/maturity assessment) - ADR:
docs/adrs/ADR-003-diloco-impl.md— chosetorchft.local_sgd.DiLoCo(BSD-3, Meta-maintained, library-not-research-code) over 4 alternatives - Working code:
composer_replication.diloco.make_diloco_outer_loopwrapper. Documents the sign convention (pseudo-grad = θ_initial - θ_local). - Spike 008: 5/5 single-process tests. Sign-convention test is the single best test in the framework (per cross-model review).
- Reconnaissance:
docs/research/DILOCO_RECONNAISSANCE.md
What we don't:
- True multi-replica convergence test. Single-process post-hook sequencing prevents this (replica A's outer step completes before replica B's allreduce arrives). Real-multi-process test deferred to GPU phase.
- Trainer integration. The wrapper is a context manager; wiring it into
ComposerReplicationTrainer.train()lifecycle is a separate spike.
Why this shape: DiLoCo's value proposition (decentralized inner training with sparse outer sync) only matters at multi-cluster scale. Our v0.1 target is single-cluster training with TRL. The DiLoCo wrapper is wired up so v0.2 multi-cluster training can switch it on with one config change.
OpenEnv
Status: 📋 Reference pattern (substrate, not a choice).
What we have:
- Research deep-dive:
research/03-monarch-torchforge-openenv.md§ OpenEnv (the env-format standard, how it interacts with TRL'senvironment_factory=) - Integration recipe:
docs/INTEGRATION_ARCHITECTURE.mdRecipe D — "OpenEnv is a substrate, not a choice"
What we don't:
- Direct OpenEnv code dependency. The framework's data path is
OpenEnv-compatible by virtue of using TRL's API, which accepts
environment_factory=kwargs that OpenEnv environments satisfy.
Why this shape: OpenEnv is a protocol (how an env exposes itself
to a trainer), not a library you depend on. You either implement an
OpenEnv-compatible environment or you don't. Composer 2.5's "Feature
Deletion" environment is OpenEnv-shaped; if a user provides one, our
TRL trainer accepts it via environment_factory=.
Monarch (Meta)
Status: 📋 Reference pattern (alternative coordination model).
What we have:
- Research deep-dive:
research/03-monarch-torchforge-openenv.md§ Monarch (actor mesh, hardware abstractions, comparison to Ray) - Integration recipe:
docs/INTEGRATION_ARCHITECTURE.mdRecipe C — "TorchForge + Monarch (reference patterns only, not a production target)"
What we don't:
- Direct Monarch code dependency. We use DiLoCo's pseudo-gradient sync as our coordination model; Monarch's actor mesh is an alternative.
Why this shape: Monarch is alive (Meta is shipping it) but it's a coordination layer, not an algorithm. Our framework integrates with PyTorch + TRL + torchft directly; Monarch would replace the coordination layer underneath. Documented as a future option; not a v0.1 dependency.
TorchForge (Meta, paused)
Status: 📋 Reference only (upstream paused).
What we have:
- Research deep-dive:
research/03-monarch-torchforge-openenv.md§ TorchForge — design lessons captured
What we don't:
- Code dependency. TorchForge as a project was paused by Meta.
Why this shape: The brief asked us to research TorchForge. We did. The headline finding is "Meta paused this." That's a real research output even if it doesn't translate to code.
Summary
| Substrate | Research | Recipe | Code | Tests | v0.1 production? |
|---|---|---|---|---|---|
| TRL | ✅ | ✅ | ✅ | 38 + 9 + 3 = 50 | ✅ |
| VeRL | ✅ | ✅ | 🟡 (skeleton) | — | v0.2 |
| PRIME-RL (Wave 13) | ✅ | ✅ | 🟡 (loss adapter + config) | — | v0.2 (cleanest hook) |
| DiLoCo (single-process) | ✅ | ✅ | ✅ | 5 (single-replica) | optional |
| DiLoCo over serverless (Wave 13) | ✅ | ✅ ADR-005 | ✅ Local + 🟡 Modal/HFJobs | 9 multi-process | ✅ (local) / future (cloud) |
| OpenEnv | ✅ | ✅ | n/a (protocol) | — | substrate |
| Monarch (Wave 13) | ✅ | ✅ (actor layout) | 🟡 (skeleton) | — | v0.2+ |
| TorchForge | ✅ | n/a (paused) | n/a | — | n/a |
8/8 substrates covered (was 6/6 pre-Wave-13). New since Wave 13: PRIME-RL (the cleanest custom-loss hook), Monarch (Meta's actively-shipped agentic-stack component), and serverless DiLoCo (Modal/HF Jobs adapters
- object-store rendezvous). The framework can now realize Decoupled DiLoCo across cloud executors without any cross-job NCCL — see ADR-005 for the design rationale.