Wave 13: serverless DiLoCo + replaysim normalization + 3 distillation losses + PRIME-RL + Monarch

b266c31 13 days ago

1.96 kB

	"""composer_replication.replaysim — N-teacher trace replay + dataset normalization.

	Per ADR-004, this package consolidates the framework's
	"replay an LLM trace through N teachers, get a DPO/preference dataset" flow:

	raw trace
	↓ (existing teacher_replay.replay_trace)
	list[TeacherCallResult]
	↓ (existing teacher_replay.extract_dpo_pairs)
	list[DPOPair]
	↓ (NEW — composer_replication.replaysim.normalize.DJNormalizer)
	list[NormalizedDPOPair] # length-filtered, dedup'd, chat-template-validated

	The pre-normalization pipeline is unchanged. The normalizer is opt-in via
	the new convenience function `replay_and_normalize_trace(...)` which wraps
	the existing `replay_trace` + `extract_dpo_pairs` and pipes their output
	through a `data-juicer` op-graph.

	Adopting `data-juicer` (Alibaba, Apache-2.0) was the verdict from the
	2026-05-26 reconnaissance — see docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md.
	It's the only mature library with NATIVE multi-turn `messages` + DPO
	preference-pair ops that runs CPU-only on the ops we need.

	Optional dependency: `pip install -e .[replaysim]` pulls `data-juicer`.
	Without it, the normalizer raises `ImportError` at use time but the
	package still imports cleanly.

	This module re-exports the existing `teacher_replay` API for convenience
	so users can `from composer_replication.replaysim import replay_trace`.
	"""
	from __future__ import annotations

	from composer_replication.replaysim.normalize import (
	DJNormalizer,
	NormalizedDPOPair,
	replay_and_normalize_trace,
	)

	# Re-exports from the pre-existing teacher_replay module (unchanged):
	from composer_replication.teacher_replay import (
	DPOPair,
	TeacherCallResult,
	extract_dpo_pairs,
	replay_trace,
	)

	__all__ = [
	"DJNormalizer",
	"DPOPair",
	"NormalizedDPOPair",
	"TeacherCallResult",
	"extract_dpo_pairs",
	"replay_and_normalize_trace",
	"replay_trace",
	]