Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Deep Work Loop Log — Composer 2.5 Replication Framework | |
| Started: 2026-05-26 | |
| Operator: Codeseys (Hermes Agent autonomous loop) | |
| Skill: `deep-work-loop` v1.0.0 | |
| ## Vision | |
| > Take any HuggingFace model → further RL train it using: | |
| > 1. RLVR (tests-pass reward), | |
| > 2. SDPO/hint-distillation (Composer 2.5's "targeted RL with textual feedback"), | |
| > 3. multi-teacher trace-replay DPO, | |
| > integrated against TRL/VeRL/OpenEnv with DiLoCo-style outer loop sync. | |
| > | |
| > Output: a published, reproducible framework — the "Composer 2.5 replication" the open ecosystem is missing. | |
| ## Starting state | |
| - HEAD: `040eff8` (Wave 6: vision validation self-audit, 5/10 scorecard) | |
| - Tests: 38/38 green in `spikes/005-integrated-trainer-skeleton/` | |
| - Working tree: clean | |
| ## Phase ledger | |
| | Phase | Description | Status | Started | Done | | |
| |---|---|---|---|---| | |
| | 1 | commit-state | ✅ | 2026-05-26 | 2026-05-26 | | |
| | 2 | backlog-audit (BACKLOG.md from VISION_VALIDATION) | ✅ | 2026-05-26 | 2026-05-26 | | |
| | 3 | parallel-research (3 subagents) | 🟡 | 2026-05-26 | | | |
| | 4 | architect with ADRs (ADR-001..003) | ⏳ | | | | |
| | 5 | plan in waves (W7–W10) | ⏳ | | | | |
| | 6 | execute W7 — Spike 006 (real HF model smoke) | ⏳ | | | | |
| | 7 | execute W8 — Spike 007 (real trace ingestion) | ⏳ | | | | |
| | 8 | execute W9 — Spike 008 (DiLoCo smoke) | ⏳ | | | | |
| | 9 | execute W10 — packaging | ⏳ | | | | |
| | 10 | (Modal-gated) Spike 002a-mini real GPU smoke | ⏳ | | | | |
| | 11 | cross-model-final-review | ⏳ | | | | |
| | 12 | update scorecard + push | ⏳ | | | | |
| ## Constraints | |
| - Verify ALL claims against primary sources (Wave 2 lesson — subagent synthesis is not evidence). | |
| - Tests must pass before commit. | |
| - Memory L1 is at 99% — write to L2 wiki + L3 fact_store, not L1. | |
| - Modal budget: $20 hard cap for this loop. Anything more goes to user for approval. | |
| - No `upload_file` mixing with `git push` — `git push hf master:main` only. | |
| - Commit messages via `-F /tmp/<wave>-commit-msg.txt`. | |