Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Publications | |
| > **Pre-experimental release materials, drafted 2026-05-25, not yet posted publicly.** | |
| > Use [`RELEASE_CHECKLIST.md`](RELEASE_CHECKLIST.md) to coordinate the publication wave when ready to ship. | |
| | Artifact | What | Where it goes | | |
| |---|---|---| | |
| | [`PAPER_v0.md`](PAPER_v0.md) | Longform methodology paper (~6,500 words) — central document | arXiv (eventually) or just as the canonical writeup on the repo | | |
| | [`BLOG_POST.md`](BLOG_POST.md) | Blog post (~2,400 words) in HuggingFace Blog markdown format | HuggingFace blog PR + personal blog / Substack / Medium | | |
| | [`HF_DISCUSSION_POST.md`](HF_DISCUSSION_POST.md) | Repo Community-tab discussion announcing the release | This repo's [Discussions tab](https://huggingface.co/Codeseys/composer-replication-framework/discussions) | | |
| | [`TWITTER_THREAD.md`](TWITTER_THREAD.md) | 13-tweet thread, 5-tweet short version, LinkedIn variant | X / Twitter / LinkedIn | | |
| | [`RELEASE_CHECKLIST.md`](RELEASE_CHECKLIST.md) | Pre-flight checklist + sequencing recommendation + risk register | Internal coordination | | |
| | [`/CITATION.cff`](../CITATION.cff) | Citation File Format — HF/GitHub renders a "Cite this repository" UI from this | Repo root | | |
| | [`/CITATION.bib`](../CITATION.bib) | BibTeX equivalent | Repo root | | |
| ## What this collection is and isn't | |
| **It is:** a complete, self-consistent draft of a pre-experimental release announcing the methodology, integration architecture, OPSD/SDPO framing, the novel TR-DPO channel, and the spike-001/spike-005 results. Every claim is either upstream-citation-backed or empirically validated by the spikes. | |
| **It isn't:** post-experimental. There are no training results yet. Spike 002–004 (~$500 GPU + a few weeks of wallclock) are the gate to a v0.1 release that adds empirical training validation. | |
| ## Honest framing reused throughout | |
| All four publication-facing documents (`PAPER_v0.md`, `BLOG_POST.md`, `HF_DISCUSSION_POST.md`, `TWITTER_THREAD.md`) include explicit "what I'm NOT claiming" sections. That framing is the publication's defense against overclaim — the work being released is methodology, integration architecture, and economic feasibility for the novel channel, not "this method works." | |
| If anything in those documents reads as if it claims more than that, edit before posting. | |
| ## Sequencing TL;DR | |
| 1. HF Discussion post (lowest stakes; pre-announces the methodology) | |
| 2. Blog post (anchor narrative) | |
| 3. X / LinkedIn (after blog URL exists) | |
| 4. arXiv (defer until v0.1 with empirical results — see `RELEASE_CHECKLIST.md`) | |
| Three-day gap between (1) and (2) lets early-feedback iterations land before the bigger announcement. | |