Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
ADR-006 — RL framework strategy: TRL + VeRL + PRIME-RL
Status: Accepted Date: 2026-05-26 Wave: 13
Context
The brief's V3 clause names six substrates: monarch, torchforge, openenv, VeRL, TRL (plus DiLoCo). Cross-model review (Wave 11) flagged that V3 was thin on the RL-framework side: TRL has working code, VeRL has a config skeleton, and Monarch/TorchForge/OpenEnv are research-only.
User's 2026-05-26 expansion: "see if there are other frameworks that are more popular that we could try to use. meta's pytorch agentic stack components are something that I'd like to explore."
docs/research/RL_FRAMEWORKS_LANDSCAPE.md audited:
- 6 RL frameworks: OpenRLHF, PRIME-RL, NeMo-Aligner, Unsloth, LLaMA-Factory, DeepSpeed-Chat
- 4 Meta PyTorch stack components: Monarch, TorchTitan, TorchForge, torchchat
Options considered
| Framework | License | GRPO/DAPO? | Custom-loss extension | Verdict |
|---|---|---|---|---|
| OpenRLHF | Apache-2 | ✅ DAPO | Fork openrlhf/models/loss.py + Trainer subclass (~400-600 LOC) |
Strong but heavyweight |
| PRIME-RL | Apache-2 | ✅ GRPO + DAPO | First-class CustomLossConfig with LossInputs struct (~200-300 LOC) |
Chosen |
| NeMo-Aligner | Apache-2 | ❌ no GRPO/DAPO | n/a | Reject |
| Unsloth | Apache-2 | TRL patcher | Closed unsloth_zoo loss kernels — unhookable |
Reject |
| LLaMA-Factory | Apache-2 | ❌ delegates to EasyR1 | n/a | Reject |
| DeepSpeed-Chat | Apache-2 | ❌ PPO+DPO only | feature-stale since 2023 | Reject |
| Meta stack | License | Active? | Role |
|---|---|---|---|
| Monarch | BSD-3 | ✅ v0.4.1 stable, v0.5 dev | Actor mesh — coordination layer for any SPMD trainer |
| TorchTitan | BSD-3 | ✅ active | Distributed-training stack (already a transitive dep of PRIME-RL) |
| TorchForge | BSD-3 | ❌ paused | Patterns only, per repo banner |
| torchchat | BSD-3 | active | Inference only — out of scope |
Decision
Add PRIME-RL as the third RL framework after TRL+VeRL, and Monarch as the agentic-stack coordination layer.
Why PRIME-RL
PRIME-RL ships a first-class CustomLossConfig with an import_path
that lets us drop in a Python function returning a tensor. The config
exposes a LossInputs struct with exactly the tensors we need:
trainer_logprobs, inference_logprobs, teacher_logprobs,
advantages, loss_mask. This is the cleanest possible extension
point for a 3-channel loss — no fork, no Trainer subclass, no monkey-
patching.
It also uses the verifiers env protocol (OpenEnv-compatible by design),
so it slots into the framework's existing data path without translation.
PRIME-RL was used to train INTELLECT-1 (10B base, 30 nodes) and INTELLECT-2 (32B QwQ); production-tested on real distributed runs.
Why Monarch (not TorchForge or TorchTitan as a top-level)
- Monarch is what's actually shipping from Meta's agentic stack. v0.4.1 is stable, v0.5 dev daily. BSD-3.
- TorchForge is paused per its own repo banner. We document it (research/03) but don't depend on it.
- TorchTitan is a transitive dep of PRIME-RL already, so we get its benefits without needing to build a direct integration. If we wanted a TorchTitan-only path, it would be redundant with PRIME-RL.
- torchchat is inference-only and doesn't fit the training-framework conversation.
Monarch's role in our stack: the actor mesh that hosts trainer/generator/ rewarder/judge actors. PRIME-RL's three-actor split (trainer, generator, rewarder) maps naturally onto Monarch primitives.
Consequences
Accepted
composer_replication/recipes/prime_rl/directory:prime_rl_recipe.md— integration recipe (parallel to TRL Recipe A, VeRL Recipe B)composer_loss.py— the 3-channel loss adapted to PRIME-RL'sLossInputsstruct (~200-300 LOC)prime_rl_config.yaml— example PRIME-RL config wiring our loss in
composer_replication/recipes/monarch/directory:monarch_actor_layout.md— design doc for the actor meshactors.py— placeholder Monarch actor definitions (skeleton only; full integration is post-replication)
- New optional dependencies in
pyproject.toml:[prime-rl]extra:prime-rl>=0.5[monarch]extra:monarch>=0.4.1
docs/V3_SUBSTRATE_COVERAGE.mdupdated to reflect the new additions.
Three-recipe production matrix
| User scenario | Recommended recipe |
|---|---|
| Quick start, single-cluster, ≤7B | TRL Recipe A |
| Production multi-node, ≤32B | VeRL Recipe B |
| Decentralized / DiLoCo-shape, any size | PRIME-RL recipe (NEW) |
| Coordination-heavy multi-actor RL | Monarch + any of the above |
Trade-offs explicitly accepted
- Three RL frameworks is a maintenance burden. We accept this because no single one covers all the user scenarios above. The framework's contribution is the 3-channel loss + the trace-replay channel, expressed in three different framework idioms. Each recipe is ~200-300 LOC; total triplication tax ~700 LOC vs. picking one framework.
- Monarch is BSD-3 not MIT. The framework is MIT; users opting in to Monarch take on its license. Documented in pyproject.toml's optional extras.
- PRIME-RL's API may evolve. The
LossInputsstruct is currently the contract; if PRIME-RL stabilizes a different shape we'd need to bump. Pin to v0.5.x in our optional extras.
Source
docs/research/RL_FRAMEWORKS_LANDSCAPE.md (2026-05-26 subagent recon,
primary-sourced from DeepWiki audits + GitHub repo READMEs + PyPI release
metadata).