Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # HF Repo Layout — composer-replication-framework | |
| Per the [HF multi-artifact research project pattern](https://huggingface.co/docs/hub/repositories), this project will eventually span multiple HF repos. This document records the layout. | |
| ## Current state (2026-05-25) | |
| Only the **methodology repo** exists. No trained variants, no datasets yet. | |
| | Repo | Type | Status | Purpose | | |
| |---|---|---|---| | |
| | `Codeseys/composer-replication-framework` | model | ✅ exists (this repo) | Methodology, ADRs, framework spec, research deep-dives | | |
| ## Planned splits (post-spike) | |
| When the v0.0 spike produces a result, the following repos will be created: | |
| | Repo | Type | Created when | Contents | | |
| |---|---|---|---| | |
| | `Codeseys/composer-replication-traces-v0` | dataset | v0.0 spike data is collected | 100 frozen agentic-coding traces (JSON), used for trace-replay-distillation experiments | | |
| | `Codeseys/composer-replication-qwen3-7b-v0` | model | v0.0 spike produces a checkpoint | LoRA adapter or full fine-tune of Qwen3-7B trained with GRPO + trace-replay-DPO | | |
| | `Codeseys/composer-replication-qwen3-7b-v0-baseline` | model | v0.0 spike produces a baseline checkpoint | Same training, plain GRPO only (A/B comparison) | | |
| After v0.1: | |
| | Repo | Type | Contents | | |
| |---|---|---| | |
| | `Codeseys/composer-replication-traces-v1` | dataset | Larger trace corpus + Feature-Deletion environment seed repos | | |
| | `Codeseys/composer-replication-feature-deletion-env-v1` | dataset | Repos with passing tests, with deletion masks for the env to apply | | |
| | `Codeseys/composer-replication-qwen3-32b-v1` | model | Full Composer-recipe v1 trained variant | | |
| All trained-variant repos will: | |
| - Link back to **this repo** (`Codeseys/composer-replication-framework`) in their `README.md` as the methodology source. | |
| - Live in an **HF Collection** (`composer-replication-*`) created when the second member repo is added. | |
| ## Why this split | |
| Per the `huggingface-hub` skill's `references/multi-artifact-research-layout.md`: | |
| 1. **Type semantics matter** — HF dataset repos have native handling for jsonl/parquet (streaming load, dataset viewer). The model repo type used for *this* repo treats markdown research as first-class. | |
| 2. **Cite-ability** — each trained variant gets its own DOI / citation. | |
| 3. **Variant training is unbounded** — we don't know how many variants will ship; per-variant repos keep eval results, model cards, and weights cleanly separated. | |
| 4. **Discoverability via Collection** — single URL surfaces the whole study. | |
| ## Conventions | |
| - **Repo prefix**: `composer-replication-` for every repo in this study. | |
| - **Variant suffix**: `<base-model>-<size>-<scale-tag>` (e.g. `qwen3-7b-v0`, `qwen3-32b-v1`). | |
| - **Dataset suffix**: `-traces-v<N>`, `-feature-deletion-env-v<N>`, `-bench-v<N>`. | |
| - **Branch**: `master` locally → push to HF as `main` (refspec `master:main`). | |
| - **License**: MIT for methodology and code; per-trained-variant license depends on base model's license. | |
| ## Sync pattern | |
| When adding a new variant repo, use the `huggingface-hub` skill's `references/sync-to-hf-template.py` shape — `create_repo` + `upload_folder` + `add_collection_item(exists_ok=True)` in a single script, so shipping a new variant is one command. | |