Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Examples Index
Five CPU-runnable examples demonstrating the framework end-to-end on real HF causal LMs. They form a progression from simplest to most methodologically complete:
| # | Example | Trace source | Channels | Wall-clock | Closes |
|---|---|---|---|---|---|
| 1 | qwen_05b_quickstart/ |
minimal toy | LM-CE only | ~30s | "does the package import + run at all" |
| 2 | gsm8k_grpo/ |
hand-written GSM8K (100 rows) | GRPO with alpha=beta=0 |
~60s | Plain-GRPO baseline reference |
| 3 | gsm8k_grpo_with_sdpo/ |
hand-written GSM8K (B=2) | GRPO + SDPO column | ~25s | SDPO column wiring on synthetic prompts |
| 4 | sdpo_with_real_traces/ |
ClaudeCodeIngester reading a hand-authored session JSONL |
GRPO + SDPO column | ~30s | Partial V5 — ingestion path validated; wiring smoke (misaligned) |
| 5 | sdpo_with_real_traces_production/ |
ClaudeCodeIngester → adapter → ComposerDataCollator (with-error fixture) |
GRPO + SDPO (production-aligned) | ~2min | V5 closure — full production pipeline with error-site detection + properly-aligned SDPO mask |
Recommended walk-through order: 1 → 2 → 3 → 4 → 5. Each builds on the previous in scope.
Why five?
- #1 verifies the package is installable and the loss composition works at all (no SDPO, no DPO — pure LM-CE on a toy model).
- #2 uses the production
ComposerReplicationTrainer(TRLGRPOTrainersubclass) on a real GSM8K dataset with a regex-extract reward. This is the recipe a new user copy-pastes to start. - #3 drops the TRL trainer wrapper and calls
compose_lossdirectly on hand-crafted hint contexts. The simplest place to see "alpha_sdpo=0.5 changes the loss" with all the wiring visible. - #4 uses real ingested Claude Code session JSONL (via
ClaudeCodeIngester) but builds the SDPO batch by hand — demonstrates the ingester works but the SDPO mask covers misaligned content. Wiring smoke, not production-grade. - #5 is the production-grade sibling to #4: adds the
claude_states_to_trace_examplesadapter and usesComposerDataCollatorto build properly-aligned SDPO batches with hint injection at actual error sites. This is what you should copy for real training.
What every example asserts
Each run.py ends with a verification block that asserts:
- The targeted channel(s) actually fired (
sdpo_jsd > 0when alpha_sdpo > 0) - The composed loss isn't trivially equal to
lm_cealone - Gradient norms are finite and non-zero at every step
Failure of any assertion exits non-zero and the script prints which channel didn't fire. This is the user's smoke test, not just a demo.
Production training
For real training (GPU, larger models, longer rollouts), use
ComposerReplicationTrainer directly with a ComposerDataCollator
that emits SDPO + DPO columns — exactly the path example #5
demonstrates. See docs/INTEGRATION_RECIPES.md for the production
wiring patterns.