Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
docs: add refine-2026-06 engagement summary
Browse filesTop-level summary of the docs/refine-2026-06 engagement: every file touched
across waves 1-3, what changed and why, what was deliberately left alone, two
maintainer notes for issues outside docs-only scope (the off-limits
ADR-002-channel2-sdpo dead link in examples/, and the API_REFERENCE
config-factory gap), and the verification proofs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- docs/_refine-2026-06-SUMMARY.md +113 -0
docs/_refine-2026-06-SUMMARY.md
ADDED
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Docs Refine 2026-06 — Change Summary
|
| 2 |
+
|
| 3 |
+
> Branch: `docs/refine-2026-06` (off `master` HEAD `aae66fa`). **Docs-only.** Not merged,
|
| 4 |
+
> no PR opened — left for human review. Commit range: `aae66fa..e130879` (3 commits).
|
| 5 |
+
|
| 6 |
+
This engagement refined the documentation corpus to (1) enforce the ground-truth provenance
|
| 7 |
+
correction recorded in [ADR-014](adrs/ADR-014-policy-optimization-objective-menu.md), (2)
|
| 8 |
+
archive point-in-time historical artifacts without breaking references, and (3) add a single
|
| 9 |
+
honest newcomer overview. **No `.py`, `pyproject.toml`, or any file under
|
| 10 |
+
`composer_replication/ examples/ spikes/ tests/` was touched** — proven by
|
| 11 |
+
`git diff --name-only aae66fa..HEAD` showing only `.md` paths (see end of this doc).
|
| 12 |
+
|
| 13 |
+
## Method
|
| 14 |
+
|
| 15 |
+
Plan → parallel read-only audit (4 agents over the living docs) → apply fixes in the main
|
| 16 |
+
thread → two independent adversarial review passes (one per the post-wave-1 commit, one over
|
| 17 |
+
the whole changeset) → iterate to convergence. Both adversaries + a deterministic link/invariant
|
| 18 |
+
script signed off with zero blockers.
|
| 19 |
+
|
| 20 |
+
## Commits
|
| 21 |
+
|
| 22 |
+
| SHA | Wave | Theme |
|
| 23 |
+
|---|---|---|
|
| 24 |
+
| `20e3bd9` | Wave 1 | Correctness: channel-3 provenance, gap honesty, dead links |
|
| 25 |
+
| `f00833d` | Wave 2 | Archive point-in-time wave reviews + dated review bundles (move + redirect stub) |
|
| 26 |
+
| `e130879` | Wave 3 | Add `docs/OVERVIEW.md`, index ADR-014, fold in adversarial-review corrections |
|
| 27 |
+
|
| 28 |
+
## Files touched — what changed and why
|
| 29 |
+
|
| 30 |
+
### Wave 1 — correctness (commit `20e3bd9`)
|
| 31 |
+
|
| 32 |
+
| File | Change | Fact |
|
| 33 |
+
|---|---|---|
|
| 34 |
+
| `README.md` | v0.1 roadmap cell reframed: "Full Composer recipe" = channels 1 (Dr.GRPO) + 2 (SDPO); trace-replay-DPO labelled the framework's own addition with an ADR-014 link. | A |
|
| 35 |
+
| `docs/HF_REPO_LAYOUT.md` | v0 and v1 trained-variant rows: stop bundling trace-replay-DPO into "Composer recipe"; mark it additive. | A |
|
| 36 |
+
| `docs/VISION_VALIDATION.md` | Status banner: stale "210 passing tests" → "115 + 1 skip-marked" pointing to the canonical `V1_V8_COVERAGE.md`; note the PO-objective menu (default Dr.GRPO, ADR-014); keep both honest gaps OPEN (Docker e2e; A1-done / A2–A4-scaffold). | B, D |
|
| 37 |
+
| `docs/ALTERED_MINDS_TIE_IN.md` | Phase-3: only A1 has a real Modal runner; A2/A3/A4 scaffold + plan-builder only, blocked on dataset construction; real 8B run additionally user-gated. Added a `strip_thinking=False`-for-SDPO foot-gun note. | D, E |
|
| 38 |
+
| `docs/USER_GUIDE.md` | Clone+install block: add `git checkout master` + a branch foot-gun callout (HF `main` lags `master`; else ImportError on `make_dr_grpo_config`). | F |
|
| 39 |
+
| `BACKLOG.md` | Fixed two dead paths `examples/qwen3_05b_quickstart/` → `examples/qwen_05b_quickstart/`. | dead-link |
|
| 40 |
+
| `docs/INTEGRATION_RECIPES.md` | Fixed dead link `ADR-007-distillation-losses.md` → `ADR-007-self-distillation-losses.md`. | dead-link |
|
| 41 |
+
| `framework/composer-replication-framework.md` | Fixed 2 root-relative links that 404 from a subdirectory on HF Hub (`docs/…`, `spikes/…` → `../…`). | dead-link |
|
| 42 |
+
| `publications/HF_DISCUSSION_POST.md` | Fixed 7 root-relative links (same subdirectory-resolution issue → `../` / same-dir). | dead-link |
|
| 43 |
+
|
| 44 |
+
### Wave 2 — archive historical artifacts (commit `f00833d`)
|
| 45 |
+
|
| 46 |
+
Moved (via `git mv`, history preserved) into archives, with a **one-line redirect stub** left
|
| 47 |
+
at every original path so prose references — including those baked into immutable accepted ADRs
|
| 48 |
+
(ADR-007/008/012) and off-limits spike verdicts that cannot be edited — keep resolving:
|
| 49 |
+
|
| 50 |
+
- → `docs/research/_archive/`: `WAVE_7_10_FINAL_REVIEW.md`, `WAVE_13/14/15_FINAL_REVIEW.md`.
|
| 51 |
+
- → `docs/_archive/`: `DEEP_WORK_LOOP_LOG.md`, `WAVE_COMPOSER_DATAGEN_RL_2026-05-29.md`.
|
| 52 |
+
- → `docs/_archive/reviews/`: the two dated review bundles
|
| 53 |
+
(`cross-family-adr-008-009-010-2026-05-29/`, `final-verify-deep-work-2026-05-29/`) — all 8
|
| 54 |
+
per-model review/verify files moved as renames; a `SYNTHESIS.md` redirect stub remains at each
|
| 55 |
+
origin (the entry point ADRs cite by directory).
|
| 56 |
+
- Added `docs/_archive/README.md` and `docs/research/_archive/README.md` indexing what was
|
| 57 |
+
archived and why ("point-in-time, superseded by current METHODOLOGY / BACKLOG / V1_V8_COVERAGE
|
| 58 |
+
/ ADRs"), extending the existing `_archive` convention (`WAVE_16_RECON_AUDIT.md` already lived
|
| 59 |
+
in `docs/research/_archive/`).
|
| 60 |
+
|
| 61 |
+
### Wave 3 — new overview + ADR index + review fixes (commit `e130879`)
|
| 62 |
+
|
| 63 |
+
| File | Change |
|
| 64 |
+
|---|---|
|
| 65 |
+
| `docs/OVERVIEW.md` | **New.** 5-minute newcomer tour: what it is, the three channels with honest provenance (1+2 = genuine Composer replication; 3 = framework's additive channel), what's proven (CPU SDPO-fires, A1 8B Modal run, GSM8K GRPO, $0.98/trace, 115 tests), what's gapped (Docker e2e, A2–A4 ladder), day-one foot-guns (main-lag, strip_thinking, k1/k3 delta, compose_loss-is-harness). Linked from README + both `_archive` READMEs. |
|
| 66 |
+
| `README.md` | Added a "🧭 New here? → OVERVIEW.md" pointer + clarified the intro that trace-replay is the framework's own addition (not Cursor's). Added the master-branch guard to the Install block (adversary finding). |
|
| 67 |
+
| `docs/adrs/README.md` | Added the missing ADR-014 row + a provenance note recording the channel-3 correction. |
|
| 68 |
+
| `docs/ALTERED_MINDS_TIE_IN.md`, `docs/VISION_VALIDATION.md` | Adversary corrections: dropped the parent-commit SHA mislabelled as "HEAD"; re-attributed the A2–A4 gap claim to cite ADR-014 only for "the A1 run used dr_grpo" (and ADR-013 for its sole user-gated box) instead of ADR-014's acceptance gate, which doesn't contain the dataset-construction detail; fixed one more stale `qwen3_05b_quickstart` path. |
|
| 69 |
+
|
| 70 |
+
## Deliberately left alone — and why
|
| 71 |
+
|
| 72 |
+
- **Accepted ADR bodies (ADR-001…014).** Immutable once `accepted` (per the ADR index's own
|
| 73 |
+
rule). Only the ADR *index* (`docs/adrs/README.md`) was updated. The provenance correction
|
| 74 |
+
was propagated into the *living* docs that ADR-014 supersedes, not by editing older ADRs.
|
| 75 |
+
- **`research/01..12`, `framework/`, `publications/PAPER_v0.md` deep-dive bodies.** Preserved as
|
| 76 |
+
point-in-time research snapshots (only the 9 dead links in `framework/` +
|
| 77 |
+
`publications/HF_DISCUSSION_POST.md` were repaired). `docs/COMPOSER_RECIPE_MAPPING.md` and
|
| 78 |
+
`docs/METHODOLOGY.md` were audited and found **already correct** on channel-3 provenance (they
|
| 79 |
+
already frame channel 3 as "NOVEL — our addition / not in Composer"), so they were not rewritten.
|
| 80 |
+
- **`docs/VISION_VALIDATION.md` dated update blocks (e.g. the "77 tests" Wave-12 line).** These
|
| 81 |
+
are explicitly-dated historical self-audit snapshots in the doc's house style; only the current
|
| 82 |
+
top status banner was refreshed. Rewriting dated snapshots would falsify the audit trail.
|
| 83 |
+
- **The two `qwen3_7b_*` proposals in VISION_VALIDATION (lines ~80, ~138).** These accurately
|
| 84 |
+
record example dirs that were *proposed* in the Wave-6 audit but never built under any name;
|
| 85 |
+
"fixing" them to the 0.5B path would misrepresent history. Only the line describing the
|
| 86 |
+
packaging deliverable that actually shipped (`qwen_05b_quickstart`) was corrected.
|
| 87 |
+
|
| 88 |
+
## Notes for a maintainer (issues found but NOT fixable under docs-only / scope)
|
| 89 |
+
|
| 90 |
+
1. **Off-limits dead link.** `examples/gsm8k_grpo_with_sdpo/README.md:66` links to
|
| 91 |
+
`docs/adrs/ADR-002-channel2-sdpo.md`, which does not exist (ADR-002 is `ADR-002-trace-source.md`;
|
| 92 |
+
the SDPO design decision is **ADR-008**). This file is under `examples/` (off-limits to this
|
| 93 |
+
docs-only engagement), so it was left unchanged. **Recommended fix:** repoint that link to
|
| 94 |
+
`docs/adrs/ADR-008-drgrpo-sdpo-live-channel.md`.
|
| 95 |
+
2. **API_REFERENCE freshness gap.** `docs/API_REFERENCE.md` documents the `composer_replication`
|
| 96 |
+
surface but has **no section for the trainer config factories** — neither `make_dr_grpo_config`
|
| 97 |
+
(ADR-008) nor the new `make_po_config(objective=…)` / `PO_OBJECTIVES` menu (ADR-014). This is a
|
| 98 |
+
*missing* doc, not a *wrong* one; it was not added here because the public signatures could not
|
| 99 |
+
be verified against source without reading the `.py` (out of scope), and Invariant 7 forbids
|
| 100 |
+
fabricating an API surface. **Recommended fix:** add a config-factory section to API_REFERENCE
|
| 101 |
+
from the verified `composer_replication/trainer/composer_trainer.py` signatures.
|
| 102 |
+
|
| 103 |
+
## Verification (proof, not claim)
|
| 104 |
+
|
| 105 |
+
- **Docs-only invariant:** `git diff --name-only aae66fa..HEAD` → every path ends in `.md`
|
| 106 |
+
(no `.py`, no `pyproject.toml`, nothing under `composer_replication/ examples/ spikes/ tests/`).
|
| 107 |
+
- **Link integrity:** a scripted scan of every relative `[](…)` link across all in-scope `.md`
|
| 108 |
+
(root + `docs/` + `framework/` + `publications/`, excluding the 4 off-limits trees) reports
|
| 109 |
+
**zero dead links** in the changeset. The one known dead link (`ADR-002-channel2-sdpo.md`) is
|
| 110 |
+
pre-existing and lives in off-limits `examples/` — see note 1.
|
| 111 |
+
- **Archive integrity:** every archived file has both a redirect stub at its origin and a
|
| 112 |
+
full-content copy under `_archive/`; all 8 review-dir files preserved (renames, no content loss).
|
| 113 |
+
- **Two adversarial reviews + a deterministic check** all returned zero blockers.
|