"""composer_replication.replaysim — N-teacher trace replay + dataset normalization. Per ADR-004, this package consolidates the framework's "replay an LLM trace through N teachers, get a DPO/preference dataset" flow: raw trace ↓ (existing teacher_replay.replay_trace) list[TeacherCallResult] ↓ (existing teacher_replay.extract_dpo_pairs) list[DPOPair] ↓ (NEW — composer_replication.replaysim.normalize.DJNormalizer) list[NormalizedDPOPair] # length-filtered, dedup'd, chat-template-validated The pre-normalization pipeline is unchanged. The normalizer is opt-in via the new convenience function `replay_and_normalize_trace(...)` which wraps the existing `replay_trace` + `extract_dpo_pairs` and pipes their output through a `data-juicer` op-graph. Adopting `data-juicer` (Alibaba, Apache-2.0) was the verdict from the 2026-05-26 reconnaissance — see docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md. It's the only mature library with NATIVE multi-turn `messages` + DPO preference-pair ops that runs CPU-only on the ops we need. Optional dependency: `pip install -e .[replaysim]` pulls `data-juicer`. Without it, the normalizer raises `ImportError` at use time but the package still imports cleanly. This module re-exports the existing `teacher_replay` API for convenience so users can `from composer_replication.replaysim import replay_trace`. """ from __future__ import annotations from composer_replication.replaysim.normalize import ( DJNormalizer, NormalizedDPOPair, replay_and_normalize_trace, ) # Re-exports from the pre-existing teacher_replay module (unchanged): from composer_replication.teacher_replay import ( DPOPair, TeacherCallResult, extract_dpo_pairs, replay_trace, ) __all__ = [ "DJNormalizer", "DPOPair", "NormalizedDPOPair", "TeacherCallResult", "extract_dpo_pairs", "replay_and_normalize_trace", "replay_trace", ]