Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Wave 17: close all 5 audit FLAGs + SDPO context alignment + serverless re-exports
Browse filesThree-track follow-up to Wave 16, with cross-model review (gemini-3.1-pro
APPROVED, grok-4.3 surfaced one BLOCKER that verified false-positive
empirically, deepseek-v4-pro burned all max_tokens on hidden reasoning
producing no visible output):
Wave 17a — serverless package re-exports (real code bug)
composer_replication/diloco/serverless/__init__.py now re-exports
ModalExecutor and HFJobsExecutor alongside LocalProcessExecutor. The
package docstring already documented `from composer_replication.diloco.serverless
import ModalExecutor` as the public API but the import would have
failed because the names weren't in __init__.py's import list or
__all__. Both modal.py and hf_jobs.py import-safely (they don't
import the optional `modal` / `huggingface_hub` packages at module
level — those are deferred to class method calls), so adding the
re-exports doesn't introduce a new dependency.
Pinned by a regression test
(test_public_reexports_include_all_executors in
test_serverless_local.py) that asserts every executor adapter the
package docstring documents is in __all__ AND importable.
Wave 16's user-journey reviewer caught this as a BLOCKER. Wave 17a
fixes it.
Wave 17b — SDPO context alignment in the gsm8k_grpo_with_sdpo example
Previous version of build_inputs() tokenized student/teacher
contexts with default right-padding/right-truncation, which dropped
the `<|im_start|>assistant\n` marker for any prompt longer than T=32
(both ours are: student=36, teacher=46). The SDPO mask covered
prompt-area positions instead of the assistant-response area, and
the channel computed JSD over MISALIGNED tokens.
Wave 17b flips both tokenizer.padding_side and tokenizer.truncation_side
to "left" under a try/finally, so:
- inputs shorter than T get LEFT-padded (assistant marker stays at T-1)
- inputs longer than T get LEFT-truncated (drops leading system
turns, keeps the assistant marker on the right)
Verified bit-identical right-most 16 positions across student vs
teacher contexts (the assistant-generation marker + last few prompt
tokens are the common suffix; chat-template appends the same
`<|im_start|>assistant\n` regardless of how many system turns
preceded). Post-fix SDPO signal is 0.0611-0.0642 (vs 0.1358-0.1429
pre-fix); the lower number is the meaningful one because the channel
now computes JSD over actually-aligned positions instead of random
misaligned tokens. Total loss starts at 2.22 (vs 5.98 pre-fix) and
drops to 1.28 over 5 SGD steps in 21.3s on CPU.
This is the alignment discipline production SDPO requires —
composer_replication/trainer/composer_trainer.py:_compute_sdpo_loss
raises a shape-mismatch warning and skips the channel if student/
teacher logits don't match shape, so misalignment in production
silently disables the column.
Wave 17c — close all 5 audit FLAGs from WAVE_16_RECON_AUDIT.md
Each FLAGged proposal section in
docs/research/{DILOCO_SERVERLESS,REPLAYSIM_NORMALIZATION,
RL_FRAMEWORKS,SELF_DISTILLATION,TRACE_SOURCE}_RECONNAISSANCE.md
now has a "Realised in v0.1" blockquote section IMMEDIATELY ABOVE
the historical proposal sketch, documenting:
- The actual public surface (imports + class/function names)
- The actual file paths and module layout
- Constructor signatures and key kwarg names that differ from
the proposal
- Cross-references to the realised tests / verification harnesses
The HTML AUDIT comments are removed (replaced by the blockquote
sections, which serve the same purpose with more useful content).
The historical proposal sketches are preserved verbatim below — they
document the shape of pre-ADR thinking that fed each ADR, which is
valuable archival context.
WAVE_16_RECON_AUDIT.md gets a closeout banner explaining all 5 FLAGs
are resolved + how (proposal docs got Realised-in-v0.1 sections;
FLAG #1's underlying code bug fixed in Wave 17a). The original
FLAG-list table is preserved for archival continuity.
Cross-model review (route-fidelity-verified via direct urllib scatter)
- gemini-3.1-pro-preview-20260219 ($0.049, 27s): APPROVED. Walked
through 2 of the 5 doc rewrites + verified the README expected-
output numbers match what run.py produces + sanity-checked the
closeout banner placement + spot-checked 2 cross-link paths. All
coverage passed clean.
- grok-4.3-20260430 ($0.013, 7s): REQUEST_CHANGES with 1 BLOCKER —
"eager `from .modal import ModalExecutor` at package import time
will raise if `modal` isn't installed". Verified false-positive
empirically: modal.py only imports typing + framework-internal
classes at module level (the actual `modal` package is deferred
to class-method bodies); test environment has modal NOT installed
and `from composer_replication.diloco.serverless import
ModalExecutor` works fine. Per subagent-driven-development.md
"verifying subagent claims": spot-check before applying.
- deepseek-v4-pro-20260423: failed to produce visible output — burned
all 4000 max_tokens on hidden reasoning_tokens. Methodological
note for future reviews: when using DeepSeek-V4-Pro for adversarial
review, set max_tokens >= 6000 to leave room for reasoning + output.
Test count after Wave 17: 177 passed / 2 skipped (was 176/2 in
the same scope; the +1 is the public-reexport regression test).
The example's run.py acceptance assertions pass with the new alignment
numbers.
- composer_replication/diloco/serverless/__init__.py +4 -0
- composer_replication/diloco/serverless/tests/test_serverless_local.py +42 -0
- docs/research/DILOCO_SERVERLESS_RECONNAISSANCE.md +41 -10
- docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md +40 -13
- docs/research/RL_FRAMEWORKS_LANDSCAPE.md +36 -4
- docs/research/SELF_DISTILLATION_LANDSCAPE.md +36 -7
- docs/research/TRACE_SOURCE_RECONNAISSANCE.md +29 -6
- docs/research/WAVE_16_RECON_AUDIT.md +10 -0
- examples/gsm8k_grpo_with_sdpo/README.md +15 -12
- examples/gsm8k_grpo_with_sdpo/run.py +59 -24
|
@@ -52,10 +52,14 @@ from composer_replication.diloco.serverless.executor import (
|
|
| 52 |
ReplicaHandle,
|
| 53 |
ServerlessExecutor,
|
| 54 |
)
|
|
|
|
|
|
|
| 55 |
|
| 56 |
__all__ = [
|
|
|
|
| 57 |
"LocalProcessExecutor",
|
| 58 |
"MockManager",
|
|
|
|
| 59 |
"ObjectStoreAllReduce",
|
| 60 |
"ReplicaHandle",
|
| 61 |
"ServerlessExecutor",
|
|
|
|
| 52 |
ReplicaHandle,
|
| 53 |
ServerlessExecutor,
|
| 54 |
)
|
| 55 |
+
from composer_replication.diloco.serverless.hf_jobs import HFJobsExecutor
|
| 56 |
+
from composer_replication.diloco.serverless.modal import ModalExecutor
|
| 57 |
|
| 58 |
__all__ = [
|
| 59 |
+
"HFJobsExecutor",
|
| 60 |
"LocalProcessExecutor",
|
| 61 |
"MockManager",
|
| 62 |
+
"ModalExecutor",
|
| 63 |
"ObjectStoreAllReduce",
|
| 64 |
"ReplicaHandle",
|
| 65 |
"ServerlessExecutor",
|
|
@@ -248,3 +248,45 @@ def test_mock_manager_shape_compat():
|
|
| 248 |
assert hasattr(work, "wait") and callable(work.wait)
|
| 249 |
assert work.wait() is True
|
| 250 |
torch.testing.assert_close(buf, t, atol=1e-6, rtol=1e-6)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
assert hasattr(work, "wait") and callable(work.wait)
|
| 249 |
assert work.wait() is True
|
| 250 |
torch.testing.assert_close(buf, t, atol=1e-6, rtol=1e-6)
|
| 251 |
+
|
| 252 |
+
|
| 253 |
+
# ---------------------------------------------------------------------
|
| 254 |
+
# Public re-export surface (Wave 17a)
|
| 255 |
+
# ---------------------------------------------------------------------
|
| 256 |
+
|
| 257 |
+
|
| 258 |
+
def test_public_reexports_include_all_executors():
|
| 259 |
+
"""`from composer_replication.diloco.serverless import …` must
|
| 260 |
+
surface every executor adapter the module's docstring claims, not
|
| 261 |
+
just the LocalProcessExecutor.
|
| 262 |
+
|
| 263 |
+
Wave 16's user-journey reviewer caught that ModalExecutor /
|
| 264 |
+
HFJobsExecutor were defined in `modal.py` / `hf_jobs.py` but not
|
| 265 |
+
re-exported from the package's `__init__.py`. Users who copied the
|
| 266 |
+
docstring's `from composer_replication.diloco.serverless import
|
| 267 |
+
ModalExecutor` line got an ImportError. Wave 17a added the missing
|
| 268 |
+
re-exports; this test pins them.
|
| 269 |
+
"""
|
| 270 |
+
import composer_replication.diloco.serverless as ss
|
| 271 |
+
|
| 272 |
+
expected = {
|
| 273 |
+
"LocalProcessExecutor",
|
| 274 |
+
"ModalExecutor",
|
| 275 |
+
"HFJobsExecutor",
|
| 276 |
+
"MockManager",
|
| 277 |
+
"ObjectStoreAllReduce",
|
| 278 |
+
"ReplicaHandle",
|
| 279 |
+
"ServerlessExecutor",
|
| 280 |
+
}
|
| 281 |
+
actual = set(ss.__all__)
|
| 282 |
+
assert expected.issubset(actual), (
|
| 283 |
+
f"Missing re-exports: {expected - actual}. "
|
| 284 |
+
f"__all__ should include every executor adapter the package "
|
| 285 |
+
f"docstring documents."
|
| 286 |
+
)
|
| 287 |
+
|
| 288 |
+
# Also verify each name is actually importable, not just listed.
|
| 289 |
+
for name in expected:
|
| 290 |
+
assert hasattr(ss, name), (
|
| 291 |
+
f"{name} listed in __all__ but not present on package."
|
| 292 |
+
)
|
|
@@ -621,16 +621,47 @@ if __name__ == "__main__":
|
|
| 621 |
|
| 622 |
### 3.4 Package layout
|
| 623 |
|
| 624 |
-
|
| 625 |
-
|
| 626 |
-
|
| 627 |
-
|
| 628 |
-
|
| 629 |
-
|
| 630 |
-
|
| 631 |
-
|
| 632 |
-
|
| 633 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 634 |
|
| 635 |
```
|
| 636 |
composer_replication/
|
|
|
|
| 621 |
|
| 622 |
### 3.4 Package layout
|
| 623 |
|
| 624 |
+
> **Realised in v0.1 (Wave 17 update):** ADR-005 shipped a flatter
|
| 625 |
+
> layout than the proposal below. The actual `composer_replication/diloco/serverless/`
|
| 626 |
+
> tree:
|
| 627 |
+
>
|
| 628 |
+
> ```
|
| 629 |
+
> composer_replication/
|
| 630 |
+
> └── diloco/
|
| 631 |
+
> ├── __init__.py # existing: make_diloco_outer_loop, torchft import
|
| 632 |
+
> └── serverless/
|
| 633 |
+
> ├── __init__.py # re-exports all public classes (Wave 17a)
|
| 634 |
+
> ├── executor.py # ServerlessExecutor Protocol + ReplicaHandle
|
| 635 |
+
> │ # + LocalProcessExecutor concrete adapter
|
| 636 |
+
> ├── allreduce.py # ObjectStoreAllReduce + MockManager
|
| 637 |
+
> ├── modal.py # ModalExecutor (skeleton — see __init__ docstring)
|
| 638 |
+
> ├── hf_jobs.py # HFJobsExecutor (skeleton — uses huggingface_hub.run_job)
|
| 639 |
+
> ├── replica_entrypoint.py # script each replica runs
|
| 640 |
+
> └── tests/ # multi-process file:// rendezvous tests
|
| 641 |
+
> ```
|
| 642 |
+
>
|
| 643 |
+
> No leading underscores, no `_protocol`/`_base`/`_rendezvous` split,
|
| 644 |
+
> and Modal/HFJobs are flat modules rather than subpackages. The full
|
| 645 |
+
> public re-export surface (verified by
|
| 646 |
+
> `tests/test_serverless_local.py::test_public_reexports_include_all_executors`):
|
| 647 |
+
>
|
| 648 |
+
> ```python
|
| 649 |
+
> from composer_replication.diloco.serverless import (
|
| 650 |
+
> ServerlessExecutor, # Protocol — implement to add your own backend
|
| 651 |
+
> LocalProcessExecutor, # multi-process local replicas (CPU/GPU)
|
| 652 |
+
> ModalExecutor, # Modal cloud — skeleton in modal.py
|
| 653 |
+
> HFJobsExecutor, # HuggingFace Jobs — skeleton in hf_jobs.py
|
| 654 |
+
> ObjectStoreAllReduce, # fsspec-backed allreduce (s3://, gs://, file://, hf://)
|
| 655 |
+
> MockManager, # torchft.Manager-shaped duck-type
|
| 656 |
+
> ReplicaHandle, # opaque handle returned by launch_replicas
|
| 657 |
+
> )
|
| 658 |
+
> ```
|
| 659 |
+
>
|
| 660 |
+
> Wave 16's user-journey reviewer caught that earlier versions of this
|
| 661 |
+
> `__init__.py` defined `ModalExecutor` and `HFJobsExecutor` in their
|
| 662 |
+
> respective modules but failed to re-export them from the package
|
| 663 |
+
> namespace. Wave 17a fixed the re-exports and added a regression
|
| 664 |
+
> test. The proposal below predates that fix.
|
| 665 |
|
| 666 |
```
|
| 667 |
composer_replication/
|
|
@@ -312,19 +312,46 @@ write_jsonl(out_path, pairs)
|
|
| 312 |
|
| 313 |
### 4.3 Adapter shape (`replaysim/normalize.py`)
|
| 314 |
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
|
| 327 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 328 |
|
| 329 |
```python
|
| 330 |
# composer_replication/replaysim/normalize.py
|
|
|
|
| 312 |
|
| 313 |
### 4.3 Adapter shape (`replaysim/normalize.py`)
|
| 314 |
|
| 315 |
+
> **Realised in v0.1 (Wave 17 update):** ADR-004 shipped with a different
|
| 316 |
+
> public surface than the sketch below. The actual API:
|
| 317 |
+
>
|
| 318 |
+
> ```python
|
| 319 |
+
> from composer_replication.replaysim import (
|
| 320 |
+
> replay_and_normalize_trace, # convenience wrapper
|
| 321 |
+
> DJNormalizer, # the normalizer class
|
| 322 |
+
> DPOPair, # input TypedDict (from teacher_replay)
|
| 323 |
+
> NormalizedDPOPair, # output TypedDict
|
| 324 |
+
> replay_trace, extract_dpo_pairs, # re-exports of upstream stages
|
| 325 |
+
> )
|
| 326 |
+
> ```
|
| 327 |
+
>
|
| 328 |
+
> Key shape differences from the sketch:
|
| 329 |
+
>
|
| 330 |
+
> 1. **`DPOPair` is a TypedDict, not a dataclass.** Its actual fields
|
| 331 |
+
> are `{state_id: str, state_messages: list[dict], chosen: str,
|
| 332 |
+
> rejected: str, n_teachers_agreeing: int}` (defined in
|
| 333 |
+
> `composer_replication/teacher_replay.py:99`) — **not**
|
| 334 |
+
> `{prompt, chosen, rejected, state, meta}`. The `_to_dj`/`_from_dj`
|
| 335 |
+
> sketch round-trip below would not type-check against the realised
|
| 336 |
+
> TypedDict.
|
| 337 |
+
> 2. **Recipe path is `composer_replication/recipes/replaysim/default.yaml`**,
|
| 338 |
+
> not `composer_replication/replaysim/recipes/dpo_normalize.yaml`.
|
| 339 |
+
> There is no `replaysim/recipes/` subpackage; recipes live under
|
| 340 |
+
> the top-level `recipes/` tree.
|
| 341 |
+
> 3. **No `composer_replication/replaysim/ops/` subpackage exists.**
|
| 342 |
+
> The custom op file `preference_validator.py` was not created;
|
| 343 |
+
> data-juicer's stock ops + the framework's own validation in
|
| 344 |
+
> `DJNormalizer` covered the requirement.
|
| 345 |
+
> 4. **The integration hook is `replay_and_normalize_trace(...)`** in
|
| 346 |
+
> `composer_replication/replaysim/__init__.py` (re-exported from
|
| 347 |
+
> `normalize.py`). It wraps the existing `replay_trace` +
|
| 348 |
+
> `extract_dpo_pairs` flow without modifying `teacher_replay.py`.
|
| 349 |
+
> There is no separate `composer_replication/replaysim/teacher_replay.py`
|
| 350 |
+
> — `teacher_replay` lives at top-level `composer_replication/teacher_replay.py`.
|
| 351 |
+
>
|
| 352 |
+
> The pre-spike sketch below is preserved as historical proposal context.
|
| 353 |
+
> It documents the shape of thinking that fed ADR-004; the realised code
|
| 354 |
+
> is the source of truth for the adapter contract.
|
| 355 |
|
| 356 |
```python
|
| 357 |
# composer_replication/replaysim/normalize.py
|
|
@@ -313,10 +313,42 @@ group_size = 16
|
|
| 313 |
|
| 314 |
[trainer]
|
| 315 |
algorithm = "grpo"
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
[trainer.loss]
|
| 321 |
type = "custom"
|
| 322 |
import_path = "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
|
|
|
|
| 313 |
|
| 314 |
[trainer]
|
| 315 |
algorithm = "grpo"
|
| 316 |
+
|
| 317 |
+
> **Realised in v0.1 (Wave 17 update):** Wave 14b shipped the PRIME-RL
|
| 318 |
+
> recipe at `composer_replication/recipes/prime_rl/prime_rl_config.yaml`
|
| 319 |
+
> as **YAML** with a different kwarg surface than the TOML sketch below.
|
| 320 |
+
> The actual recipe shape:
|
| 321 |
+
>
|
| 322 |
+
> ```yaml
|
| 323 |
+
> # composer_replication/recipes/prime_rl/prime_rl_config.yaml
|
| 324 |
+
> model:
|
| 325 |
+
> base: "Qwen/Qwen2.5-0.5B"
|
| 326 |
+
> attn_implementation: "flash_attention_2"
|
| 327 |
+
> dtype: "bfloat16"
|
| 328 |
+
> env:
|
| 329 |
+
> protocol: "verifiers"
|
| 330 |
+
> config: { name: "math/gsm8k", split: "train" }
|
| 331 |
+
> loss:
|
| 332 |
+
> custom:
|
| 333 |
+
> import_path: "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
|
| 334 |
+
> kwargs:
|
| 335 |
+
> alpha_sdpo: 0.0 # channel 2 deferred in v0
|
| 336 |
+
> beta_dpo: 0.0 # channel 3 out-of-scope for PRIME-RL v0
|
| 337 |
+
> dppo_mask_high: 0.2 # PRIME-RL DPPO convention (NOT textbook PPO)
|
| 338 |
+
> dppo_mask_low: 0.2 # both must be >= 0 per Field(..., ge=0)
|
| 339 |
+
> adv_tau: 1.0 # advantage normalization
|
| 340 |
+
> kl_tau: 0.04 # KL coefficient
|
| 341 |
+
> ```
|
| 342 |
+
>
|
| 343 |
+
> The realised `loss_fn(inputs, **kwargs)` matches PRIME-RL's
|
| 344 |
+
> `LossInputs`/`LossOutputs` interface (read upstream `prime_rl/loss.py`
|
| 345 |
+
> for parity verification — Wave 14b's shadow-parity test independently
|
| 346 |
+
> restates the formula in
|
| 347 |
+
> `composer_replication/recipes/prime_rl/tests/test_composer_loss.py`).
|
| 348 |
+
>
|
| 349 |
+
> The pre-Wave-14b TOML/`hint_weight`/`replay_weight` sketch below is
|
| 350 |
+
> preserved as historical proposal context.
|
| 351 |
+
|
| 352 |
[trainer.loss]
|
| 353 |
type = "custom"
|
| 354 |
import_path = "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
|
|
@@ -352,13 +352,42 @@ license + reproducible scale) to recommend adding right now.
|
|
| 352 |
For ADR-007 the proposed addition is a `composer_replication.distillation`
|
| 353 |
sub-package with three pluggable hooks:
|
| 354 |
|
| 355 |
-
|
| 356 |
-
|
| 357 |
-
|
| 358 |
-
|
| 359 |
-
|
| 360 |
-
|
| 361 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 362 |
|
| 363 |
```
|
| 364 |
composer_replication/
|
|
|
|
| 352 |
For ADR-007 the proposed addition is a `composer_replication.distillation`
|
| 353 |
sub-package with three pluggable hooks:
|
| 354 |
|
| 355 |
+
> **Realised in v0.1 (Wave 17 update):** ADR-007 shipped a flatter
|
| 356 |
+
> layout than the proposal below. Actual exports:
|
| 357 |
+
>
|
| 358 |
+
> ```
|
| 359 |
+
> composer_replication/
|
| 360 |
+
> distillation/
|
| 361 |
+
> __init__.py
|
| 362 |
+
> simpo.py # simpo_loss(chosen_avg_logprobs, rejected_avg_logprobs, *, beta, gamma)
|
| 363 |
+
> # avg_sequence_logprob(logprobs, mask) -- helper
|
| 364 |
+
> taid.py # taid_loss(student_logits, teacher_logits, t, *, ...)
|
| 365 |
+
> # TAIDScheduler -- adaptive momentum schedule per the paper
|
| 366 |
+
> entropy_aware_opd.py # entropy_aware_opd_loss(student_logits, teacher_logits, *, h_max, ...)
|
| 367 |
+
> ```
|
| 368 |
+
>
|
| 369 |
+
> No `targets.py`/`losses.py` split, no top-level `preference/` package,
|
| 370 |
+
> and SimPO lives under `distillation/` rather than `preference/` because
|
| 371 |
+
> the three losses share a common dispatch surface (`compose_loss`'s
|
| 372 |
+
> `dpo_variant` and `sdpo_wrapper` switches).
|
| 373 |
+
>
|
| 374 |
+
> The composition rule realised in `compose_loss` is per-loss flag-driven,
|
| 375 |
+
> not a single composed-function call:
|
| 376 |
+
>
|
| 377 |
+
> ```python
|
| 378 |
+
> compose_loss(model, inputs,
|
| 379 |
+
> dpo_variant="simpo", # OR "dpo" (default)
|
| 380 |
+
> sdpo_wrapper="taid", # OR "entropy_opd" OR "none" (default)
|
| 381 |
+
> taid_t=0.5, # required when sdpo_wrapper="taid"
|
| 382 |
+
> simpo_beta=2.0, simpo_gamma=1.0, # used only when dpo_variant="simpo"
|
| 383 |
+
> entropy_opd_h_max=..., # used only when sdpo_wrapper="entropy_opd"
|
| 384 |
+
> )
|
| 385 |
+
> ```
|
| 386 |
+
>
|
| 387 |
+
> The pre-ADR proposal sketch below is preserved as historical context.
|
| 388 |
+
> The shipped function names are `simpo_loss`, `taid_loss` +
|
| 389 |
+
> `TAIDScheduler`, and `entropy_aware_opd_loss` (not `taid_target` /
|
| 390 |
+
> `entropy_aware_kl_loss`).
|
| 391 |
|
| 392 |
```
|
| 393 |
composer_replication/
|
|
@@ -244,12 +244,35 @@ For users on other machines: `find ~/.claude/projects -name '*.jsonl' -size +50k
|
|
| 244 |
|
| 245 |
## 6. TraceIngester sketch
|
| 246 |
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 253 |
|
| 254 |
Drop-in adapter for spike-005's `replay_trace()`. Targets `TraceState` (the actual existing TypedDict; see §1).
|
| 255 |
|
|
|
|
| 244 |
|
| 245 |
## 6. TraceIngester sketch
|
| 246 |
|
| 247 |
+
> **Realised in v0.1 (Wave 17 update):** The realised ingester ships at
|
| 248 |
+
> `composer_replication/ingestion/claude_code.py` exporting
|
| 249 |
+
> `ClaudeCodeIngester`, with the spike at
|
| 250 |
+
> `spikes/007-real-trace-ingestion/claude_code_ingester.py`. The
|
| 251 |
+
> public production surface is:
|
| 252 |
+
>
|
| 253 |
+
> ```python
|
| 254 |
+
> from pathlib import Path
|
| 255 |
+
> from composer_replication.ingestion.claude_code import ClaudeCodeIngester
|
| 256 |
+
>
|
| 257 |
+
> ingester = ClaudeCodeIngester(skip_sidechain=True, strip_thinking=True)
|
| 258 |
+
> for trace_state in ingester.ingest(Path("~/.claude/projects/.../session.jsonl").expanduser()):
|
| 259 |
+
> # trace_state matches the TraceState TypedDict from §1
|
| 260 |
+
> ...
|
| 261 |
+
> stats = ingester.last_stats # IngestionStats — turn counts, skip reasons
|
| 262 |
+
> ```
|
| 263 |
+
>
|
| 264 |
+
> The shipped `ClaudeCodeIngester` differs from the pre-spike sketch
|
| 265 |
+
> below in:
|
| 266 |
+
> - Class name: `ClaudeCodeIngester` (not `TraceIngester`)
|
| 267 |
+
> - Module path: `composer_replication.ingestion.claude_code` (not
|
| 268 |
+
> `spikes/007-trace-ingester/trace_ingester.py`)
|
| 269 |
+
> - The constructor takes config kwargs (`system_prompt`,
|
| 270 |
+
> `skip_sidechain`, `strip_thinking`, `max_history_tokens`); paths
|
| 271 |
+
> are passed to `.ingest(Path)` per call instead of being held by the
|
| 272 |
+
> ingester
|
| 273 |
+
> - The yielded type is `TraceState` (matches §1)
|
| 274 |
+
>
|
| 275 |
+
> The pre-spike sketch below is preserved as historical proposal context.
|
| 276 |
|
| 277 |
Drop-in adapter for spike-005's `replay_trace()`. Targets `TraceState` (the actual existing TypedDict; see §1).
|
| 278 |
|
|
@@ -125,6 +125,16 @@ spike-shape names that do not match what shipped.
|
|
| 125 |
|
| 126 |
## Open items for Wave 17+
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
These are the FLAGged ambiguous claims that need orchestrator decision before
|
| 129 |
a confident rewrite:
|
| 130 |
|
|
|
|
| 125 |
|
| 126 |
## Open items for Wave 17+
|
| 127 |
|
| 128 |
+
> **Wave 17 closeout (2026-05-26):** All 5 FLAGs below were resolved
|
| 129 |
+
> by adding "Realised in v0.1" companion sections to the affected
|
| 130 |
+
> proposal docs that document the shipped surface inline above the
|
| 131 |
+
> historical sketch. Wave 17 also fixed the underlying code bug from
|
| 132 |
+
> FLAG #1 — `serverless/__init__.py` now re-exports `ModalExecutor`
|
| 133 |
+
> and `HFJobsExecutor` (verified by
|
| 134 |
+
> `composer_replication/diloco/serverless/tests/test_serverless_local.py::test_public_reexports_include_all_executors`).
|
| 135 |
+
> The original FLAG-list below is preserved as historical record of
|
| 136 |
+
> what the audit found.
|
| 137 |
+
|
| 138 |
These are the FLAGged ambiguous claims that need orchestrator decision before
|
| 139 |
a confident rewrite:
|
| 140 |
|
|
@@ -23,23 +23,26 @@ The script will print 5 SGD steps' worth of channel-decomposed losses
|
|
| 23 |
and end with three ✓ assertions:
|
| 24 |
|
| 25 |
```
|
| 26 |
-
step 1/5: total=
|
| 27 |
-
step 2/5: total=
|
| 28 |
...
|
| 29 |
-
step 5/5: total=
|
| 30 |
[4/4] Verifying SDPO column wiring ...
|
| 31 |
-
✓ sdpo_jsd > 0 at every step (min=0.
|
| 32 |
-
✓ total != lm_ce at every step (min |diff|=0.
|
| 33 |
-
✓ |grad| > 0 and finite at every step (min=
|
| 34 |
✅ SDPO column wiring verified end-to-end.
|
| 35 |
```
|
| 36 |
|
| 37 |
-
Wall-clock on the reference run: **
|
| 38 |
-
1.7s model-load phase (no model download — already cached). The
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
If `sdpo_jsd` ever shows up as `0.0000`, the SDPO column is silent —
|
| 45 |
that means either (a) `alpha_sdpo=0`, (b) `ctx_teacher_input_ids` is
|
|
|
|
| 23 |
and end with three ✓ assertions:
|
| 24 |
|
| 25 |
```
|
| 26 |
+
step 1/5: total=2.2215 lm_ce=2.1898 sdpo_jsd=0.0634 trace_replay_dpo=0.0000 |grad|=1.38e+06
|
| 27 |
+
step 2/5: total=1.7695 lm_ce=1.7374 sdpo_jsd=0.0642 trace_replay_dpo=0.0000 |grad|=1.12e+06
|
| 28 |
...
|
| 29 |
+
step 5/5: total=1.2781 lm_ce=1.2465 sdpo_jsd=0.0631 trace_replay_dpo=0.0000 |grad|=8.24e+05
|
| 30 |
[4/4] Verifying SDPO column wiring ...
|
| 31 |
+
✓ sdpo_jsd > 0 at every step (min=0.0611, max=0.0642)
|
| 32 |
+
✓ total != lm_ce at every step (min |diff|=0.0306, max=0.0321)
|
| 33 |
+
✓ |grad| > 0 and finite at every step (min=8.24e+05, max=1.38e+06)
|
| 34 |
✅ SDPO column wiring verified end-to-end.
|
| 35 |
```
|
| 36 |
|
| 37 |
+
Wall-clock on the reference run: **21.3s** for 5 SGD steps after a
|
| 38 |
+
1.7s model-load phase (no model download — already cached). The script
|
| 39 |
+
left-pads and left-truncates the chat-template'd input so student and
|
| 40 |
+
teacher contexts are bit-identical on the right-most 16 positions —
|
| 41 |
+
the same alignment discipline production SDPO requires (see `build_inputs`
|
| 42 |
+
docstring for the alignment rationale and the link to
|
| 43 |
+
`composer_replication/trainer/data_collator.py`). Without left-truncation
|
| 44 |
+
the assistant marker gets dropped and the SDPO mask covers prompt-area
|
| 45 |
+
tokens instead, inflating the channel signal on misaligned positions.
|
| 46 |
|
| 47 |
If `sdpo_jsd` ever shows up as `0.0000`, the SDPO column is silent —
|
| 48 |
that means either (a) `alpha_sdpo=0`, (b) `ctx_teacher_input_ids` is
|
|
@@ -106,15 +106,36 @@ def build_inputs(tokenizer) -> dict[str, torch.Tensor]:
|
|
| 106 |
"""Tokenize PROBLEMS into a compose_loss-shaped batch.
|
| 107 |
|
| 108 |
Returns a dict with:
|
| 109 |
-
- input_ids: (B, T) student rollouts (no hint)
|
| 110 |
-
- response_mask: (B, T)
|
| 111 |
-
- ctx_teacher_input_ids: (B, T) hint-conditioned context
|
| 112 |
-
- sdpo_loss_mask: (B, T) 1 at
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
student_msg_lists = [_build_chat_messages(p["question"], with_hint=False) for p in PROBLEMS[:B]]
|
| 115 |
teacher_msg_lists = [_build_chat_messages(p["question"], with_hint=True) for p in PROBLEMS[:B]]
|
| 116 |
|
| 117 |
-
# Render via Qwen's chat template — produces real <|im_start|>/<|im_end|> tokens.
|
| 118 |
student_strs = [
|
| 119 |
tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True)
|
| 120 |
for m in student_msg_lists
|
|
@@ -124,26 +145,40 @@ def build_inputs(tokenizer) -> dict[str, torch.Tensor]:
|
|
| 124 |
for m in teacher_msg_lists
|
| 125 |
]
|
| 126 |
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 145 |
response_mask = torch.zeros(B, T, dtype=torch.long)
|
| 146 |
-
response_mask[:,
|
| 147 |
sdpo_loss_mask = response_mask.clone()
|
| 148 |
|
| 149 |
return {
|
|
|
|
| 106 |
"""Tokenize PROBLEMS into a compose_loss-shaped batch.
|
| 107 |
|
| 108 |
Returns a dict with:
|
| 109 |
+
- input_ids: (B, T) student rollouts (no hint), left-padded
|
| 110 |
+
- response_mask: (B, T) 1 on the assistant-response area
|
| 111 |
+
- ctx_teacher_input_ids: (B, T) hint-conditioned context, left-padded
|
| 112 |
+
- sdpo_loss_mask: (B, T) 1 at the aligned post-prompt area
|
| 113 |
+
|
| 114 |
+
SDPO requires student and teacher logits to align position-by-position
|
| 115 |
+
over the loss mask. The student and teacher prompts have different
|
| 116 |
+
prefix lengths (teacher is longer because of the inserted hint
|
| 117 |
+
system turn), so we LEFT-pad both to T tokens — the right edge (the
|
| 118 |
+
assistant generation marker) lines up across the batch and across
|
| 119 |
+
student vs teacher. The SDPO mask covers the right-most ALIGN_LEN
|
| 120 |
+
positions, all of which correspond to identical "post-prompt /
|
| 121 |
+
assistant-response area" tokens in both contexts.
|
| 122 |
+
|
| 123 |
+
This matches the alignment discipline the production
|
| 124 |
+
`ComposerDataCollator` (composer_replication/trainer/data_collator.py)
|
| 125 |
+
must enforce: the post-hint section must have identical token
|
| 126 |
+
positions in student vs teacher, or `_compute_sdpo_loss` will
|
| 127 |
+
detect a shape mismatch and skip the channel for that step.
|
| 128 |
"""
|
| 129 |
+
# ALIGN_LEN: how many right-most positions to use for the SDPO loss.
|
| 130 |
+
# These positions correspond to the assistant-generation area, which
|
| 131 |
+
# is identical (token-for-token) across student and teacher because
|
| 132 |
+
# apply_chat_template appends the same `<|im_start|>assistant\n`
|
| 133 |
+
# marker regardless of how many system turns came before.
|
| 134 |
+
ALIGN_LEN = T // 2 # 16 of 32; same as response_mask back-half
|
| 135 |
+
|
| 136 |
student_msg_lists = [_build_chat_messages(p["question"], with_hint=False) for p in PROBLEMS[:B]]
|
| 137 |
teacher_msg_lists = [_build_chat_messages(p["question"], with_hint=True) for p in PROBLEMS[:B]]
|
| 138 |
|
|
|
|
| 139 |
student_strs = [
|
| 140 |
tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True)
|
| 141 |
for m in student_msg_lists
|
|
|
|
| 145 |
for m in teacher_msg_lists
|
| 146 |
]
|
| 147 |
|
| 148 |
+
# LEFT-pad AND LEFT-truncate to T: temporarily flip both the
|
| 149 |
+
# tokenizer's padding side and truncation side. This ensures the
|
| 150 |
+
# right edge (the assistant generation marker) is preserved at
|
| 151 |
+
# position T-1 regardless of whether the input is shorter than T
|
| 152 |
+
# (gets left-padded) or longer than T (gets left-truncated, dropping
|
| 153 |
+
# the leading system turns first). Without this, the default
|
| 154 |
+
# right-truncation discards the assistant marker — which means the
|
| 155 |
+
# SDPO mask covers tokens from the system prompt instead of the
|
| 156 |
+
# assistant response area, and the channel computes JSD over
|
| 157 |
+
# nonsense alignment.
|
| 158 |
+
original_pad = tokenizer.padding_side
|
| 159 |
+
original_trunc = tokenizer.truncation_side
|
| 160 |
+
tokenizer.padding_side = "left"
|
| 161 |
+
tokenizer.truncation_side = "left"
|
| 162 |
+
try:
|
| 163 |
+
s_tok = tokenizer(
|
| 164 |
+
student_strs, max_length=T, truncation=True,
|
| 165 |
+
padding="max_length", return_tensors="pt",
|
| 166 |
+
)
|
| 167 |
+
t_tok = tokenizer(
|
| 168 |
+
teacher_strs, max_length=T, truncation=True,
|
| 169 |
+
padding="max_length", return_tensors="pt",
|
| 170 |
+
)
|
| 171 |
+
finally:
|
| 172 |
+
tokenizer.padding_side = original_pad
|
| 173 |
+
tokenizer.truncation_side = original_trunc
|
| 174 |
+
|
| 175 |
+
# response_mask: 1 on the right-most ALIGN_LEN tokens, 0 elsewhere
|
| 176 |
+
# (left padding + prompt area). For both student and teacher these
|
| 177 |
+
# positions cover the assistant-generation marker + any padding
|
| 178 |
+
# that happens to fall there. Same indices apply to both because
|
| 179 |
+
# of left-padding alignment.
|
| 180 |
response_mask = torch.zeros(B, T, dtype=torch.long)
|
| 181 |
+
response_mask[:, -ALIGN_LEN:] = 1
|
| 182 |
sdpo_loss_mask = response_mask.clone()
|
| 183 |
|
| 184 |
return {
|