Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
docs(reconcile): main-lags-master foot-gun is RESOLVED, not live
Browse filesThe refine brief predated the 2026-06-09 branch fix. main==master now and
main is canonical, so "git checkout master after cloning" is stale/backwards.
Reframe README + USER_GUIDE + OVERVIEW to "main is canonical & synced; fresh
clone works as-is; ImportError = stale clone, git checkout main".
- README.md +1 -3
- docs/OVERVIEW.md +4 -3
- docs/USER_GUIDE.md +6 -7
README.md
CHANGED
|
@@ -35,9 +35,7 @@ This repository is the **"paper of the project"** — it is the methodology / re
|
|
| 35 |
## Install
|
| 36 |
|
| 37 |
```bash
|
| 38 |
-
#
|
| 39 |
-
# branch LAGS `master`, and installing from `main` ImportErrors on
|
| 40 |
-
# make_dr_grpo_config. (See docs/HF_REPO_LAYOUT.md + docs/TROUBLESHOOTING.md.)
|
| 41 |
pip install -e .
|
| 42 |
python examples/qwen_05b_quickstart/run.py
|
| 43 |
```
|
|
|
|
| 35 |
## Install
|
| 36 |
|
| 37 |
```bash
|
| 38 |
+
# `main` is canonical and synced with `master` — a fresh clone works as-is.
|
|
|
|
|
|
|
| 39 |
pip install -e .
|
| 40 |
python examples/qwen_05b_quickstart/run.py
|
| 41 |
```
|
docs/OVERVIEW.md
CHANGED
|
@@ -72,9 +72,10 @@ for known foot-guns.
|
|
| 72 |
|
| 73 |
## Foot-guns worth knowing on day one
|
| 74 |
|
| 75 |
-
- **
|
| 76 |
-
|
| 77 |
-
|
|
|
|
| 78 |
- **`strip_thinking` × SDPO.** On real agent traces, SDPO requires `strip_thinking=False`:
|
| 79 |
~67% of error-recovery turns are pure thinking, so stripping them yields empty SDPO masks.
|
| 80 |
- **KL estimator delta.** TRL uses the **k3** estimator; Composer's report describes **k1**.
|
|
|
|
| 72 |
|
| 73 |
## Foot-guns worth knowing on day one
|
| 74 |
|
| 75 |
+
- **Branch sync (resolved 2026-06-09).** `main` is canonical and kept in sync with `master`,
|
| 76 |
+
so a fresh Hub clone of `main` installs the complete tree. If you ever `ImportError` on
|
| 77 |
+
`make_dr_grpo_config`, your clone is stale (`git fetch && git checkout main`). Historically
|
| 78 |
+
`main` lagged `master`; that's fixed as long as both stay synced.
|
| 79 |
- **`strip_thinking` × SDPO.** On real agent traces, SDPO requires `strip_thinking=False`:
|
| 80 |
~67% of error-recovery turns are pure thinking, so stripping them yields empty SDPO masks.
|
| 81 |
- **KL estimator delta.** TRL uses the **k3** estimator; Composer's report describes **k1**.
|
docs/USER_GUIDE.md
CHANGED
|
@@ -59,16 +59,15 @@ Always start with the core install:
|
|
| 59 |
```bash
|
| 60 |
git clone https://huggingface.co/Codeseys/composer-replication-framework
|
| 61 |
cd composer-replication-framework
|
| 62 |
-
git checkout master # HF 'main' LAGS 'master'; without this you ImportError on make_dr_grpo_config
|
| 63 |
pip install -e .
|
| 64 |
```
|
| 65 |
|
| 66 |
-
> **Branch
|
| 67 |
-
>
|
| 68 |
-
>
|
| 69 |
-
>
|
| 70 |
-
>
|
| 71 |
-
>
|
| 72 |
|
| 73 |
That gets you `torch>=2.0` + `transformers>=4.46` and is enough for the
|
| 74 |
verification harness on CPU (sections 3, 5, 6).
|
|
|
|
| 59 |
```bash
|
| 60 |
git clone https://huggingface.co/Codeseys/composer-replication-framework
|
| 61 |
cd composer-replication-framework
|
|
|
|
| 62 |
pip install -e .
|
| 63 |
```
|
| 64 |
|
| 65 |
+
> **Branch note (resolved 2026-06-09).** `main` is the canonical branch and is kept in
|
| 66 |
+
> sync with `master` (`main == master`). A fresh clone of `main` has the complete tree
|
| 67 |
+
> (incl. `make_dr_grpo_config` / `make_po_config`), so no branch switch is needed.
|
| 68 |
+
> Historically `main` lagged `master` — if you ever see an `ImportError` on those symbols,
|
| 69 |
+
> the clone is stale; `git fetch && git checkout main` (or pin a current SHA) fixes it.
|
| 70 |
+
> See [`docs/HF_REPO_LAYOUT.md`](HF_REPO_LAYOUT.md).
|
| 71 |
|
| 72 |
That gets you `torch>=2.0` + `transformers>=4.46` and is enough for the
|
| 73 |
verification harness on CPU (sections 3, 5, 6).
|