Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Commit History
docs(wave1): correctness pass — channel-3 provenance, gap honesty, dead links 20e3bd9
feat(trainer): policy-optimization objective MENU (ADR-014) aae66fa
docs(adr-013): close acceptance-gate boxes 1-5; only user-gated spend remains 7b34ebf
fix(phase-8): close all 5 cross-family final-verify findings + regression tests 678d10b
feat(b4-gpu+b6): GPU train-proof on A10G + docker-gated substrate E2E test c009712
feat(wave-b): ADR-013 LMA integration + B4 end-to-end SDPO-fires proof + doc refresh 21647a4
feat(wave-a): close ADR-011 (SDPO alignment indices) + ADR-012 (review findings) d02d724
architect: ADR-011/012/013 + research (alignment-index fix, review-findings closure, LMA channel-ladder) b4a584a
review+fix: cross-family adversarial ADR review (owed item) + remediation 185cce2
docs: wave summary — Composer 2.5 data-gen + targeted-RL textual-feedback 8498e8f
feat(datagen): ADR-010 FeatureDeletionEnv synthetic-data subsystem; accepted 9336af3
feat(hints): ADR-009 layered HintGenerator; accepted 84740d4
feat(trainer): ADR-008 gate-3 live GRPO+SDPO smoke PASS; ADR-008 accepted 2a34df4
docs(adr): add ADR-008/009/010 (Dr.GRPO+SDPO, layered hints, FeatureDeletionEnv) 36ab61e
Wave 18: 14 backlog items closed + 3-reviewer cross-family review 54efac8
Wave 17: close all 5 audit FLAGs + SDPO context alignment + serverless re-exports a84c060
Wave 16: install ergonomics + gradient evidence + SDPO end-to-end example c0a5ab7
Wave 15: 4-angle multi-model self-critique caught 2 math BLOCKERs in primary loss kernels; fixed against upstream byte-for-byte + GSM8K example + ergonomics e5add15
Wave 14: close every Wave 13 review finding + 4 documentation files; Wave 14b: real PRIME-RL parity + multi-process DiLoCo convergence d9dd3a5
Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch b266c31
Wave 12: close V1-V8 brief — GPU smoke, SDPO firing, real-trace e2e d88715c
Wave 11: cross-model adversarial review + honest down-revision f16fa23
Wave 7: Phase 2-4 of deep work loop — backlog, parallel research, three ADRs ac4bfb4
Wave 6: vision validation self-audit (5/10 to 9/10 in 5 days, no GPU) 040eff8
baladithyab commited on
Wave 3: integration architecture + spike-005 trainer skeleton (16 tests pass) fd77f74
baladithyab commited on
Integrate Cursor blog directly + audit research note + add SDPO/OPSD link 1cede23
baladithyab commited on