Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 7,572 Bytes
639a760 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 | # Publication Release Checklist
> **Last updated:** 2026-05-25
> **Current state:** all materials drafted; nothing posted publicly yet.
> Use this checklist to coordinate the publication wave when ready to ship.
## What's drafted
| Artifact | Path | Status | Word count (approx) |
|---|---|---|---|
| Longform methodology paper | [`publications/PAPER_v0.md`](PAPER_v0.md) | ✅ DRAFTED | ~6,500 |
| Blog post (HF Blog format) | [`publications/BLOG_POST.md`](BLOG_POST.md) | ✅ DRAFTED | ~2,400 |
| HF Discussion thread (repo Community tab) | [`publications/HF_DISCUSSION_POST.md`](HF_DISCUSSION_POST.md) | ✅ DRAFTED | ~700 |
| Twitter / X thread (13-tweet + 5-tweet + LinkedIn variants) | [`publications/TWITTER_THREAD.md`](TWITTER_THREAD.md) | ✅ DRAFTED | ~1,200 |
| `CITATION.cff` (HF/GitHub Citation Format) | [`/CITATION.cff`](../CITATION.cff) | ✅ DRAFTED | n/a |
| `CITATION.bib` (BibTeX) | [`/CITATION.bib`](../CITATION.bib) | ✅ DRAFTED | n/a |
| Repo README (model card with frontmatter) | [`/README.md`](../README.md) | ✅ Already published (v3 with wave 4 status) | ~1,000 |
All draft materials are in `publications/` and **not yet posted**. Nothing is gated by review; everything is a self-publish decision. Ready to ship.
## Pre-flight check before shipping any of these
These items should be confirmed before posting any of the public-facing materials. Most are already done from earlier waves but listing here for completeness:
- [x] HF repo is public (`Codeseys/composer-replication-framework`)
- [x] All linked URLs resolve (cross-checked during drafts)
- [x] Test suite passes (`38/38` as of wave 4)
- [x] Spike 001 is reproducible (deterministic states + recorded results)
- [x] Cursor blog is correctly summarized (audit notice in `research/01-composer-2.5.md`)
- [x] Upstream papers cited correctly (OPSD, SDPO, Cursor blog with arXiv IDs verified)
- [x] License is MIT and consistent across `LICENSE` + `README.md` frontmatter + `CITATION.cff`
- [ ] **`CITATION.cff` author block updated with real name/ORCID** if desired (currently just "Codeseys")
- [ ] **Choose final author identity** for the byline (Codeseys handle? real name? affiliation?)
- [ ] **HF Discussion title / tags chosen** — suggested in `HF_DISCUSSION_POST.md`
- [ ] **Blog thumbnail prepared** — placeholder path in `BLOG_POST.md` frontmatter (`/blog/assets/composer-replication-framework/thumbnail.png`); needs a real image
- [ ] **arXiv submission decided** — see § "arXiv submission" below
## Sequencing recommendation
If publishing all materials, this order minimizes risk and maximizes signal:
1. **HF Discussion post first** (lowest-stakes — repo Community tab; anyone landing on the repo will see it; it pre-announces the methodology paper).
2. **Blog post / personal site second** (anchor narrative, ~2,400 words, easy to share).
3. **X / LinkedIn third** (after the blog post URL exists to anchor the thread).
4. **arXiv submission last** (if doing this — needs more polish; see below).
Three-day gap between (1) and (2) is reasonable to let the discussion post collect any early feedback that should be incorporated into the blog.
## Distribution / amplification ideas
- Cross-post the blog to:
- HuggingFace blog (PR against `huggingface/blog` repo). Their submission process is documented at https://huggingface.co/docs/hub/en/blog
- Personal blog / Substack / Medium
- Post the discussion in:
- r/LocalLLaMA (will be eaten by their algorithm but worth one shot)
- r/MachineLearning if you tag `[R]` and frame as "novel methodology, no results yet — looking for feedback"
- HackerNews "Show HN: …" — pre-experimental disclosure should be in the title
- LessWrong / Alignment Forum if you frame the reward-hacking section as the lead
- Tag in the Twitter thread:
- `@cursor_ai` (Cursor team)
- `@huggingface` (TRL team)
- `@volcanoengine` (VeRL team)
- `@MoonshotAI` (Kimi K2.5)
- `@PrimeIntellect`
## arXiv submission (decide later)
The methodology paper is currently in markdown. Pros and cons of a formal arXiv release:
**Pros**
- Citable DOI; appears in Google Scholar / Semantic Scholar
- Reaches a non-HF research audience
- Forces a higher polish bar, which catches errors
**Cons**
- Needs LaTeX conversion (~1 day of formatting work)
- The "no experimental results yet" framing is unusual for arXiv; reviewers may dismiss
- Once posted, it's permanent — corrections live as v2/v3 markers
**Recommendation:** post the HF blog and discussion first; decide on arXiv only after spike 002–004 produce results. Then make it a v0.1 paper *with* experimental backing. The current methodology paper becomes Section 2–4 of that future paper, with new sections 5+ for the empirical results.
If you do submit to arXiv now anyway: cs.LG primary, cs.AI cross-list. Title same as `PAPER_v0.md`. Abstract from the paper. Frame in the comments section as "pre-experimental methodology release; experimental validation in follow-up."
## Embargo / coordination notes
- **Cursor team coordination:** not strictly required (their blog is public, their cited papers are public, no proprietary info), but a polite heads-up tweet on day-of release is reasonable since the post heavily engages their work. `@cursor_ai` tag on tweet 1 of the X thread.
- **OPSD authors coordination:** Siyan Zhao et al. — also not required (MIT code, public paper) but tagging the lead author on the X thread is a polite signal of citation. Their handles: try `@siyan_zhao` (verify before tagging).
- **SDPO authors coordination:** same — Hübotter et al. lead author handles unverified, skip tagging if not findable.
## Risk register
| Risk | Likelihood | Mitigation |
|---|---|---|
| Someone runs spike 004 first and beats us to publication | Medium | Acknowledged. Trade-off accepted. The integration architecture is independently citable. |
| Methodology error caught after publication | Medium | Drafts have been audited (DeepWiki for code, primary-source-read for Cursor blog). 38 unit tests catch wiring bugs. The "what's NOT proven" section in the paper is explicit about open claims. |
| Hostile read claiming we overclaim novelty | Low | The paper explicitly compares to rStar / Math-Shepherd / Magpie / MoA and concedes "absence of evidence is not evidence of absence" in §9. |
| Cursor team objects to characterization | Low | Everything cited from their public blog with explicit `[BLOG-VERIFIED]` tags. SDPO/OPSD framing is supported by their own footnote. |
| Repo gets a flood of PRs / discussion noise | Low | Welcome the noise. Maintain `CONTRIBUTING.md` (TBD) when traffic justifies. |
## Post-publication tracking (if you ship)
Things to monitor in the first 2 weeks after publication:
- HF repo: stars, forks, downloads (reachable via API)
- HF Discussions tab: new threads, especially anything flagging methodology errors
- X thread: replies from people working on TRL / VeRL / OpenEnv (especially extension-point critiques)
- Citations / mentions in adjacent posts (set up Google Scholar Alert)
- arXiv mentions (if any related work cites pre-print or blog)
If a methodology error surfaces, the response protocol:
1. Acknowledge in the Discussion thread within 24 hours.
2. Patch the affected file in the repo with a clear commit message.
3. Add an "Errata" section to `PAPER_v0.md` documenting what was wrong and what changed.
4. Don't try to silently rewrite history.
---
*Drafts ready. Ship when you decide. The repo is in a clean state to support any subset of the publication wave above.*
|