Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # Publication Release Checklist | |
| > **Last updated:** 2026-05-25 | |
| > **Current state:** all materials drafted; nothing posted publicly yet. | |
| > Use this checklist to coordinate the publication wave when ready to ship. | |
| ## What's drafted | |
| | Artifact | Path | Status | Word count (approx) | | |
| |---|---|---|---| | |
| | Longform methodology paper | [`publications/PAPER_v0.md`](PAPER_v0.md) | ✅ DRAFTED | ~6,500 | | |
| | Blog post (HF Blog format) | [`publications/BLOG_POST.md`](BLOG_POST.md) | ✅ DRAFTED | ~2,400 | | |
| | HF Discussion thread (repo Community tab) | [`publications/HF_DISCUSSION_POST.md`](HF_DISCUSSION_POST.md) | ✅ DRAFTED | ~700 | | |
| | Twitter / X thread (13-tweet + 5-tweet + LinkedIn variants) | [`publications/TWITTER_THREAD.md`](TWITTER_THREAD.md) | ✅ DRAFTED | ~1,200 | | |
| | `CITATION.cff` (HF/GitHub Citation Format) | [`/CITATION.cff`](../CITATION.cff) | ✅ DRAFTED | n/a | | |
| | `CITATION.bib` (BibTeX) | [`/CITATION.bib`](../CITATION.bib) | ✅ DRAFTED | n/a | | |
| | Repo README (model card with frontmatter) | [`/README.md`](../README.md) | ✅ Already published (v3 with wave 4 status) | ~1,000 | | |
| All draft materials are in `publications/` and **not yet posted**. Nothing is gated by review; everything is a self-publish decision. Ready to ship. | |
| ## Pre-flight check before shipping any of these | |
| These items should be confirmed before posting any of the public-facing materials. Most are already done from earlier waves but listing here for completeness: | |
| - [x] HF repo is public (`Codeseys/composer-replication-framework`) | |
| - [x] All linked URLs resolve (cross-checked during drafts) | |
| - [x] Test suite passes (`38/38` as of wave 4) | |
| - [x] Spike 001 is reproducible (deterministic states + recorded results) | |
| - [x] Cursor blog is correctly summarized (audit notice in `research/01-composer-2.5.md`) | |
| - [x] Upstream papers cited correctly (OPSD, SDPO, Cursor blog with arXiv IDs verified) | |
| - [x] License is MIT and consistent across `LICENSE` + `README.md` frontmatter + `CITATION.cff` | |
| - [ ] **`CITATION.cff` author block updated with real name/ORCID** if desired (currently just "Codeseys") | |
| - [ ] **Choose final author identity** for the byline (Codeseys handle? real name? affiliation?) | |
| - [ ] **HF Discussion title / tags chosen** — suggested in `HF_DISCUSSION_POST.md` | |
| - [ ] **Blog thumbnail prepared** — placeholder path in `BLOG_POST.md` frontmatter (`/blog/assets/composer-replication-framework/thumbnail.png`); needs a real image | |
| - [ ] **arXiv submission decided** — see § "arXiv submission" below | |
| ## Sequencing recommendation | |
| If publishing all materials, this order minimizes risk and maximizes signal: | |
| 1. **HF Discussion post first** (lowest-stakes — repo Community tab; anyone landing on the repo will see it; it pre-announces the methodology paper). | |
| 2. **Blog post / personal site second** (anchor narrative, ~2,400 words, easy to share). | |
| 3. **X / LinkedIn third** (after the blog post URL exists to anchor the thread). | |
| 4. **arXiv submission last** (if doing this — needs more polish; see below). | |
| Three-day gap between (1) and (2) is reasonable to let the discussion post collect any early feedback that should be incorporated into the blog. | |
| ## Distribution / amplification ideas | |
| - Cross-post the blog to: | |
| - HuggingFace blog (PR against `huggingface/blog` repo). Their submission process is documented at https://huggingface.co/docs/hub/en/blog | |
| - Personal blog / Substack / Medium | |
| - Post the discussion in: | |
| - r/LocalLLaMA (will be eaten by their algorithm but worth one shot) | |
| - r/MachineLearning if you tag `[R]` and frame as "novel methodology, no results yet — looking for feedback" | |
| - HackerNews "Show HN: …" — pre-experimental disclosure should be in the title | |
| - LessWrong / Alignment Forum if you frame the reward-hacking section as the lead | |
| - Tag in the Twitter thread: | |
| - `@cursor_ai` (Cursor team) | |
| - `@huggingface` (TRL team) | |
| - `@volcanoengine` (VeRL team) | |
| - `@MoonshotAI` (Kimi K2.5) | |
| - `@PrimeIntellect` | |
| ## arXiv submission (decide later) | |
| The methodology paper is currently in markdown. Pros and cons of a formal arXiv release: | |
| **Pros** | |
| - Citable DOI; appears in Google Scholar / Semantic Scholar | |
| - Reaches a non-HF research audience | |
| - Forces a higher polish bar, which catches errors | |
| **Cons** | |
| - Needs LaTeX conversion (~1 day of formatting work) | |
| - The "no experimental results yet" framing is unusual for arXiv; reviewers may dismiss | |
| - Once posted, it's permanent — corrections live as v2/v3 markers | |
| **Recommendation:** post the HF blog and discussion first; decide on arXiv only after spike 002–004 produce results. Then make it a v0.1 paper *with* experimental backing. The current methodology paper becomes Section 2–4 of that future paper, with new sections 5+ for the empirical results. | |
| If you do submit to arXiv now anyway: cs.LG primary, cs.AI cross-list. Title same as `PAPER_v0.md`. Abstract from the paper. Frame in the comments section as "pre-experimental methodology release; experimental validation in follow-up." | |
| ## Embargo / coordination notes | |
| - **Cursor team coordination:** not strictly required (their blog is public, their cited papers are public, no proprietary info), but a polite heads-up tweet on day-of release is reasonable since the post heavily engages their work. `@cursor_ai` tag on tweet 1 of the X thread. | |
| - **OPSD authors coordination:** Siyan Zhao et al. — also not required (MIT code, public paper) but tagging the lead author on the X thread is a polite signal of citation. Their handles: try `@siyan_zhao` (verify before tagging). | |
| - **SDPO authors coordination:** same — Hübotter et al. lead author handles unverified, skip tagging if not findable. | |
| ## Risk register | |
| | Risk | Likelihood | Mitigation | | |
| |---|---|---| | |
| | Someone runs spike 004 first and beats us to publication | Medium | Acknowledged. Trade-off accepted. The integration architecture is independently citable. | | |
| | Methodology error caught after publication | Medium | Drafts have been audited (DeepWiki for code, primary-source-read for Cursor blog). 38 unit tests catch wiring bugs. The "what's NOT proven" section in the paper is explicit about open claims. | | |
| | Hostile read claiming we overclaim novelty | Low | The paper explicitly compares to rStar / Math-Shepherd / Magpie / MoA and concedes "absence of evidence is not evidence of absence" in §9. | | |
| | Cursor team objects to characterization | Low | Everything cited from their public blog with explicit `[BLOG-VERIFIED]` tags. SDPO/OPSD framing is supported by their own footnote. | | |
| | Repo gets a flood of PRs / discussion noise | Low | Welcome the noise. Maintain `CONTRIBUTING.md` (TBD) when traffic justifies. | | |
| ## Post-publication tracking (if you ship) | |
| Things to monitor in the first 2 weeks after publication: | |
| - HF repo: stars, forks, downloads (reachable via API) | |
| - HF Discussions tab: new threads, especially anything flagging methodology errors | |
| - X thread: replies from people working on TRL / VeRL / OpenEnv (especially extension-point critiques) | |
| - Citations / mentions in adjacent posts (set up Google Scholar Alert) | |
| - arXiv mentions (if any related work cites pre-print or blog) | |
| If a methodology error surfaces, the response protocol: | |
| 1. Acknowledge in the Discussion thread within 24 hours. | |
| 2. Patch the affected file in the repo with a clear commit message. | |
| 3. Add an "Errata" section to `PAPER_v0.md` documenting what was wrong and what changed. | |
| 4. Don't try to silently rewrite history. | |
| --- | |
| *Drafts ready. Ship when you decide. The repo is in a clean state to support any subset of the publication wave above.* | |