Reinforcement Learning
Transformers
English
post-training
distillation
agentic-coding
composer-2.5
cursor
kimi-k2
grpo
dapo
diloco
openenv
trl
verl
research
methodology
Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 1,959 Bytes
b266c31 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | """composer_replication.replaysim — N-teacher trace replay + dataset normalization.
Per ADR-004, this package consolidates the framework's
"replay an LLM trace through N teachers, get a DPO/preference dataset" flow:
raw trace
↓ (existing teacher_replay.replay_trace)
list[TeacherCallResult]
↓ (existing teacher_replay.extract_dpo_pairs)
list[DPOPair]
↓ (NEW — composer_replication.replaysim.normalize.DJNormalizer)
list[NormalizedDPOPair] # length-filtered, dedup'd, chat-template-validated
The pre-normalization pipeline is unchanged. The normalizer is opt-in via
the new convenience function `replay_and_normalize_trace(...)` which wraps
the existing `replay_trace` + `extract_dpo_pairs` and pipes their output
through a `data-juicer` op-graph.
Adopting `data-juicer` (Alibaba, Apache-2.0) was the verdict from the
2026-05-26 reconnaissance — see docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md.
It's the only mature library with NATIVE multi-turn `messages` + DPO
preference-pair ops that runs CPU-only on the ops we need.
Optional dependency: `pip install -e .[replaysim]` pulls `data-juicer`.
Without it, the normalizer raises `ImportError` at use time but the
package still imports cleanly.
This module re-exports the existing `teacher_replay` API for convenience
so users can `from composer_replication.replaysim import replay_trace`.
"""
from __future__ import annotations
from composer_replication.replaysim.normalize import (
DJNormalizer,
NormalizedDPOPair,
replay_and_normalize_trace,
)
# Re-exports from the pre-existing teacher_replay module (unchanged):
from composer_replication.teacher_replay import (
DPOPair,
TeacherCallResult,
extract_dpo_pairs,
replay_trace,
)
__all__ = [
"DJNormalizer",
"DPOPair",
"NormalizedDPOPair",
"TeacherCallResult",
"extract_dpo_pairs",
"replay_and_normalize_trace",
"replay_trace",
]
|