Codeseys's picture
Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch
b266c31
"""composer_replication.replaysim — N-teacher trace replay + dataset normalization.
Per ADR-004, this package consolidates the framework's
"replay an LLM trace through N teachers, get a DPO/preference dataset" flow:
raw trace
↓ (existing teacher_replay.replay_trace)
list[TeacherCallResult]
↓ (existing teacher_replay.extract_dpo_pairs)
list[DPOPair]
↓ (NEW — composer_replication.replaysim.normalize.DJNormalizer)
list[NormalizedDPOPair] # length-filtered, dedup'd, chat-template-validated
The pre-normalization pipeline is unchanged. The normalizer is opt-in via
the new convenience function `replay_and_normalize_trace(...)` which wraps
the existing `replay_trace` + `extract_dpo_pairs` and pipes their output
through a `data-juicer` op-graph.
Adopting `data-juicer` (Alibaba, Apache-2.0) was the verdict from the
2026-05-26 reconnaissance — see docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md.
It's the only mature library with NATIVE multi-turn `messages` + DPO
preference-pair ops that runs CPU-only on the ops we need.
Optional dependency: `pip install -e .[replaysim]` pulls `data-juicer`.
Without it, the normalizer raises `ImportError` at use time but the
package still imports cleanly.
This module re-exports the existing `teacher_replay` API for convenience
so users can `from composer_replication.replaysim import replay_trace`.
"""
from __future__ import annotations
from composer_replication.replaysim.normalize import (
DJNormalizer,
NormalizedDPOPair,
replay_and_normalize_trace,
)
# Re-exports from the pre-existing teacher_replay module (unchanged):
from composer_replication.teacher_replay import (
DPOPair,
TeacherCallResult,
extract_dpo_pairs,
replay_trace,
)
__all__ = [
"DJNormalizer",
"DPOPair",
"NormalizedDPOPair",
"TeacherCallResult",
"extract_dpo_pairs",
"replay_and_normalize_trace",
"replay_trace",
]