Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
HF Repo Layout — composer-replication-framework
Per the HF multi-artifact research project pattern, this project will eventually span multiple HF repos. This document records the layout.
Current state (2026-05-25)
Only the methodology repo exists. No trained variants, no datasets yet.
| Repo | Type | Status | Purpose |
|---|---|---|---|
Codeseys/composer-replication-framework |
model | ✅ exists (this repo) | Methodology, ADRs, framework spec, research deep-dives |
Planned splits (post-spike)
When the v0.0 spike produces a result, the following repos will be created:
| Repo | Type | Created when | Contents |
|---|---|---|---|
Codeseys/composer-replication-traces-v0 |
dataset | v0.0 spike data is collected | 100 frozen agentic-coding traces (JSON), used for trace-replay-distillation experiments |
Codeseys/composer-replication-qwen3-7b-v0 |
model | v0.0 spike produces a checkpoint | LoRA adapter or full fine-tune of Qwen3-7B trained with GRPO + trace-replay-DPO |
Codeseys/composer-replication-qwen3-7b-v0-baseline |
model | v0.0 spike produces a baseline checkpoint | Same training, plain GRPO only (A/B comparison) |
After v0.1:
| Repo | Type | Contents |
|---|---|---|
Codeseys/composer-replication-traces-v1 |
dataset | Larger trace corpus + Feature-Deletion environment seed repos |
Codeseys/composer-replication-feature-deletion-env-v1 |
dataset | Repos with passing tests, with deletion masks for the env to apply |
Codeseys/composer-replication-qwen3-32b-v1 |
model | Full Composer-recipe v1 trained variant |
All trained-variant repos will:
- Link back to this repo (
Codeseys/composer-replication-framework) in theirREADME.mdas the methodology source. - Live in an HF Collection (
composer-replication-*) created when the second member repo is added.
Why this split
Per the huggingface-hub skill's references/multi-artifact-research-layout.md:
- Type semantics matter — HF dataset repos have native handling for jsonl/parquet (streaming load, dataset viewer). The model repo type used for this repo treats markdown research as first-class.
- Cite-ability — each trained variant gets its own DOI / citation.
- Variant training is unbounded — we don't know how many variants will ship; per-variant repos keep eval results, model cards, and weights cleanly separated.
- Discoverability via Collection — single URL surfaces the whole study.
Conventions
- Repo prefix:
composer-replication-for every repo in this study. - Variant suffix:
<base-model>-<size>-<scale-tag>(e.g.qwen3-7b-v0,qwen3-32b-v1). - Dataset suffix:
-traces-v<N>,-feature-deletion-env-v<N>,-bench-v<N>. - Branch:
masterlocally → push to HF asmain(refspecmaster:main). - License: MIT for methodology and code; per-trained-variant license depends on base model's license.
Sync pattern
When adding a new variant repo, use the huggingface-hub skill's references/sync-to-hf-template.py shape — create_repo + upload_folder + add_collection_item(exists_ok=True) in a single script, so shipping a new variant is one command.