Codeseys commited on
Commit
a84c060
·
1 Parent(s): c0a5ab7

Wave 17: close all 5 audit FLAGs + SDPO context alignment + serverless re-exports

Browse files

Three-track follow-up to Wave 16, with cross-model review (gemini-3.1-pro
APPROVED, grok-4.3 surfaced one BLOCKER that verified false-positive
empirically, deepseek-v4-pro burned all max_tokens on hidden reasoning
producing no visible output):

Wave 17a — serverless package re-exports (real code bug)
composer_replication/diloco/serverless/__init__.py now re-exports
ModalExecutor and HFJobsExecutor alongside LocalProcessExecutor. The
package docstring already documented `from composer_replication.diloco.serverless
import ModalExecutor` as the public API but the import would have
failed because the names weren't in __init__.py's import list or
__all__. Both modal.py and hf_jobs.py import-safely (they don't
import the optional `modal` / `huggingface_hub` packages at module
level — those are deferred to class method calls), so adding the
re-exports doesn't introduce a new dependency.

Pinned by a regression test
(test_public_reexports_include_all_executors in
test_serverless_local.py) that asserts every executor adapter the
package docstring documents is in __all__ AND importable.

Wave 16's user-journey reviewer caught this as a BLOCKER. Wave 17a
fixes it.

Wave 17b — SDPO context alignment in the gsm8k_grpo_with_sdpo example
Previous version of build_inputs() tokenized student/teacher
contexts with default right-padding/right-truncation, which dropped
the `<|im_start|>assistant\n` marker for any prompt longer than T=32
(both ours are: student=36, teacher=46). The SDPO mask covered
prompt-area positions instead of the assistant-response area, and
the channel computed JSD over MISALIGNED tokens.

Wave 17b flips both tokenizer.padding_side and tokenizer.truncation_side
to "left" under a try/finally, so:
- inputs shorter than T get LEFT-padded (assistant marker stays at T-1)
- inputs longer than T get LEFT-truncated (drops leading system
turns, keeps the assistant marker on the right)

Verified bit-identical right-most 16 positions across student vs
teacher contexts (the assistant-generation marker + last few prompt
tokens are the common suffix; chat-template appends the same
`<|im_start|>assistant\n` regardless of how many system turns
preceded). Post-fix SDPO signal is 0.0611-0.0642 (vs 0.1358-0.1429
pre-fix); the lower number is the meaningful one because the channel
now computes JSD over actually-aligned positions instead of random
misaligned tokens. Total loss starts at 2.22 (vs 5.98 pre-fix) and
drops to 1.28 over 5 SGD steps in 21.3s on CPU.

This is the alignment discipline production SDPO requires —
composer_replication/trainer/composer_trainer.py:_compute_sdpo_loss
raises a shape-mismatch warning and skips the channel if student/
teacher logits don't match shape, so misalignment in production
silently disables the column.

Wave 17c — close all 5 audit FLAGs from WAVE_16_RECON_AUDIT.md
Each FLAGged proposal section in
docs/research/{DILOCO_SERVERLESS,REPLAYSIM_NORMALIZATION,
RL_FRAMEWORKS,SELF_DISTILLATION,TRACE_SOURCE}_RECONNAISSANCE.md
now has a "Realised in v0.1" blockquote section IMMEDIATELY ABOVE
the historical proposal sketch, documenting:
- The actual public surface (imports + class/function names)
- The actual file paths and module layout
- Constructor signatures and key kwarg names that differ from
the proposal
- Cross-references to the realised tests / verification harnesses

The HTML AUDIT comments are removed (replaced by the blockquote
sections, which serve the same purpose with more useful content).
The historical proposal sketches are preserved verbatim below — they
document the shape of pre-ADR thinking that fed each ADR, which is
valuable archival context.

WAVE_16_RECON_AUDIT.md gets a closeout banner explaining all 5 FLAGs
are resolved + how (proposal docs got Realised-in-v0.1 sections;
FLAG #1's underlying code bug fixed in Wave 17a). The original
FLAG-list table is preserved for archival continuity.

Cross-model review (route-fidelity-verified via direct urllib scatter)
- gemini-3.1-pro-preview-20260219 ($0.049, 27s): APPROVED. Walked
through 2 of the 5 doc rewrites + verified the README expected-
output numbers match what run.py produces + sanity-checked the
closeout banner placement + spot-checked 2 cross-link paths. All
coverage passed clean.
- grok-4.3-20260430 ($0.013, 7s): REQUEST_CHANGES with 1 BLOCKER —
"eager `from .modal import ModalExecutor` at package import time
will raise if `modal` isn't installed". Verified false-positive
empirically: modal.py only imports typing + framework-internal
classes at module level (the actual `modal` package is deferred
to class-method bodies); test environment has modal NOT installed
and `from composer_replication.diloco.serverless import
ModalExecutor` works fine. Per subagent-driven-development.md
"verifying subagent claims": spot-check before applying.
- deepseek-v4-pro-20260423: failed to produce visible output — burned
all 4000 max_tokens on hidden reasoning_tokens. Methodological
note for future reviews: when using DeepSeek-V4-Pro for adversarial
review, set max_tokens >= 6000 to leave room for reasoning + output.

Test count after Wave 17: 177 passed / 2 skipped (was 176/2 in
the same scope; the +1 is the public-reexport regression test).
The example's run.py acceptance assertions pass with the new alignment
numbers.

composer_replication/diloco/serverless/__init__.py CHANGED
@@ -52,10 +52,14 @@ from composer_replication.diloco.serverless.executor import (
52
  ReplicaHandle,
53
  ServerlessExecutor,
54
  )
 
 
55
 
56
  __all__ = [
 
57
  "LocalProcessExecutor",
58
  "MockManager",
 
59
  "ObjectStoreAllReduce",
60
  "ReplicaHandle",
61
  "ServerlessExecutor",
 
52
  ReplicaHandle,
53
  ServerlessExecutor,
54
  )
55
+ from composer_replication.diloco.serverless.hf_jobs import HFJobsExecutor
56
+ from composer_replication.diloco.serverless.modal import ModalExecutor
57
 
58
  __all__ = [
59
+ "HFJobsExecutor",
60
  "LocalProcessExecutor",
61
  "MockManager",
62
+ "ModalExecutor",
63
  "ObjectStoreAllReduce",
64
  "ReplicaHandle",
65
  "ServerlessExecutor",
composer_replication/diloco/serverless/tests/test_serverless_local.py CHANGED
@@ -248,3 +248,45 @@ def test_mock_manager_shape_compat():
248
  assert hasattr(work, "wait") and callable(work.wait)
249
  assert work.wait() is True
250
  torch.testing.assert_close(buf, t, atol=1e-6, rtol=1e-6)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
  assert hasattr(work, "wait") and callable(work.wait)
249
  assert work.wait() is True
250
  torch.testing.assert_close(buf, t, atol=1e-6, rtol=1e-6)
251
+
252
+
253
+ # ---------------------------------------------------------------------
254
+ # Public re-export surface (Wave 17a)
255
+ # ---------------------------------------------------------------------
256
+
257
+
258
+ def test_public_reexports_include_all_executors():
259
+ """`from composer_replication.diloco.serverless import …` must
260
+ surface every executor adapter the module's docstring claims, not
261
+ just the LocalProcessExecutor.
262
+
263
+ Wave 16's user-journey reviewer caught that ModalExecutor /
264
+ HFJobsExecutor were defined in `modal.py` / `hf_jobs.py` but not
265
+ re-exported from the package's `__init__.py`. Users who copied the
266
+ docstring's `from composer_replication.diloco.serverless import
267
+ ModalExecutor` line got an ImportError. Wave 17a added the missing
268
+ re-exports; this test pins them.
269
+ """
270
+ import composer_replication.diloco.serverless as ss
271
+
272
+ expected = {
273
+ "LocalProcessExecutor",
274
+ "ModalExecutor",
275
+ "HFJobsExecutor",
276
+ "MockManager",
277
+ "ObjectStoreAllReduce",
278
+ "ReplicaHandle",
279
+ "ServerlessExecutor",
280
+ }
281
+ actual = set(ss.__all__)
282
+ assert expected.issubset(actual), (
283
+ f"Missing re-exports: {expected - actual}. "
284
+ f"__all__ should include every executor adapter the package "
285
+ f"docstring documents."
286
+ )
287
+
288
+ # Also verify each name is actually importable, not just listed.
289
+ for name in expected:
290
+ assert hasattr(ss, name), (
291
+ f"{name} listed in __all__ but not present on package."
292
+ )
docs/research/DILOCO_SERVERLESS_RECONNAISSANCE.md CHANGED
@@ -621,16 +621,47 @@ if __name__ == "__main__":
621
 
622
  ### 3.4 Package layout
623
 
624
- <!-- AUDIT: stale_serverless_layout ADR-005 shipped a flatter layout than this
625
- proposal. Actual modules under composer_replication/diloco/serverless/
626
- are: __init__.py, executor.py (ServerlessExecutor + LocalProcessExecutor),
627
- allreduce.py (ObjectStoreAllReduce + MockManager), modal.py (ModalExecutor),
628
- hf_jobs.py (HFJobsExecutor), replica_entrypoint.py. No leading underscores,
629
- no _protocol/_base/_rendezvous split, and Modal/HFJobs are flat modules
630
- rather than subpackages. The above code-block file headers (e.g.
631
- `_modal_adapter.py`, `_hf_jobs_adapter.py`, `_protocol.py`, `_rendezvous.py`)
632
- are pre-implementation proposals; map them to the realised module names
633
- when reading. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
634
 
635
  ```
636
  composer_replication/
 
621
 
622
  ### 3.4 Package layout
623
 
624
+ > **Realised in v0.1 (Wave 17 update):** ADR-005 shipped a flatter
625
+ > layout than the proposal below. The actual `composer_replication/diloco/serverless/`
626
+ > tree:
627
+ >
628
+ > ```
629
+ > composer_replication/
630
+ > └── diloco/
631
+ > ├── __init__.py # existing: make_diloco_outer_loop, torchft import
632
+ > └── serverless/
633
+ > ├── __init__.py # re-exports all public classes (Wave 17a)
634
+ > ├── executor.py # ServerlessExecutor Protocol + ReplicaHandle
635
+ > │ # + LocalProcessExecutor concrete adapter
636
+ > ├── allreduce.py # ObjectStoreAllReduce + MockManager
637
+ > ├── modal.py # ModalExecutor (skeleton — see __init__ docstring)
638
+ > ├── hf_jobs.py # HFJobsExecutor (skeleton — uses huggingface_hub.run_job)
639
+ > ├── replica_entrypoint.py # script each replica runs
640
+ > └── tests/ # multi-process file:// rendezvous tests
641
+ > ```
642
+ >
643
+ > No leading underscores, no `_protocol`/`_base`/`_rendezvous` split,
644
+ > and Modal/HFJobs are flat modules rather than subpackages. The full
645
+ > public re-export surface (verified by
646
+ > `tests/test_serverless_local.py::test_public_reexports_include_all_executors`):
647
+ >
648
+ > ```python
649
+ > from composer_replication.diloco.serverless import (
650
+ > ServerlessExecutor, # Protocol — implement to add your own backend
651
+ > LocalProcessExecutor, # multi-process local replicas (CPU/GPU)
652
+ > ModalExecutor, # Modal cloud — skeleton in modal.py
653
+ > HFJobsExecutor, # HuggingFace Jobs — skeleton in hf_jobs.py
654
+ > ObjectStoreAllReduce, # fsspec-backed allreduce (s3://, gs://, file://, hf://)
655
+ > MockManager, # torchft.Manager-shaped duck-type
656
+ > ReplicaHandle, # opaque handle returned by launch_replicas
657
+ > )
658
+ > ```
659
+ >
660
+ > Wave 16's user-journey reviewer caught that earlier versions of this
661
+ > `__init__.py` defined `ModalExecutor` and `HFJobsExecutor` in their
662
+ > respective modules but failed to re-export them from the package
663
+ > namespace. Wave 17a fixed the re-exports and added a regression
664
+ > test. The proposal below predates that fix.
665
 
666
  ```
667
  composer_replication/
docs/research/REPLAYSIM_NORMALIZATION_RECONNAISSANCE.md CHANGED
@@ -312,19 +312,46 @@ write_jsonl(out_path, pairs)
312
 
313
  ### 4.3 Adapter shape (`replaysim/normalize.py`)
314
 
315
- <!-- AUDIT: stale_replaysim_paths_and_dpo_shape ADR-004 shipped at
316
- composer_replication/replaysim/normalize.py with a different DPOPair shape
317
- than this sketch. Actual DPOPair is a TypedDict with fields
318
- {state_id, state_messages, chosen: str, rejected: str, n_teachers_agreeing}
319
- NOT {prompt, chosen, rejected, state, meta} as in the proposal below. The
320
- YAML recipe also lives at composer_replication/recipes/replaysim/default.yaml
321
- (not composer_replication/replaysim/recipes/dpo_normalize.yaml). The hook
322
- in §4.5 is provided by `replay_and_normalize_trace` in
323
- composer_replication/replaysim/__init__.py rather than a drop-in edit to
324
- `teacher_replay.py`. The custom op file (§4.4 line 426 / §4.4 line 431)
325
- `composer_replication/replaysim/ops/preference_validator.py` was not
326
- created. Treat the sketch below as proposal, not as documentation of
327
- the realised code. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
328
 
329
  ```python
330
  # composer_replication/replaysim/normalize.py
 
312
 
313
  ### 4.3 Adapter shape (`replaysim/normalize.py`)
314
 
315
+ > **Realised in v0.1 (Wave 17 update):** ADR-004 shipped with a different
316
+ > public surface than the sketch below. The actual API:
317
+ >
318
+ > ```python
319
+ > from composer_replication.replaysim import (
320
+ > replay_and_normalize_trace, # convenience wrapper
321
+ > DJNormalizer, # the normalizer class
322
+ > DPOPair, # input TypedDict (from teacher_replay)
323
+ > NormalizedDPOPair, # output TypedDict
324
+ > replay_trace, extract_dpo_pairs, # re-exports of upstream stages
325
+ > )
326
+ > ```
327
+ >
328
+ > Key shape differences from the sketch:
329
+ >
330
+ > 1. **`DPOPair` is a TypedDict, not a dataclass.** Its actual fields
331
+ > are `{state_id: str, state_messages: list[dict], chosen: str,
332
+ > rejected: str, n_teachers_agreeing: int}` (defined in
333
+ > `composer_replication/teacher_replay.py:99`) — **not**
334
+ > `{prompt, chosen, rejected, state, meta}`. The `_to_dj`/`_from_dj`
335
+ > sketch round-trip below would not type-check against the realised
336
+ > TypedDict.
337
+ > 2. **Recipe path is `composer_replication/recipes/replaysim/default.yaml`**,
338
+ > not `composer_replication/replaysim/recipes/dpo_normalize.yaml`.
339
+ > There is no `replaysim/recipes/` subpackage; recipes live under
340
+ > the top-level `recipes/` tree.
341
+ > 3. **No `composer_replication/replaysim/ops/` subpackage exists.**
342
+ > The custom op file `preference_validator.py` was not created;
343
+ > data-juicer's stock ops + the framework's own validation in
344
+ > `DJNormalizer` covered the requirement.
345
+ > 4. **The integration hook is `replay_and_normalize_trace(...)`** in
346
+ > `composer_replication/replaysim/__init__.py` (re-exported from
347
+ > `normalize.py`). It wraps the existing `replay_trace` +
348
+ > `extract_dpo_pairs` flow without modifying `teacher_replay.py`.
349
+ > There is no separate `composer_replication/replaysim/teacher_replay.py`
350
+ > — `teacher_replay` lives at top-level `composer_replication/teacher_replay.py`.
351
+ >
352
+ > The pre-spike sketch below is preserved as historical proposal context.
353
+ > It documents the shape of thinking that fed ADR-004; the realised code
354
+ > is the source of truth for the adapter contract.
355
 
356
  ```python
357
  # composer_replication/replaysim/normalize.py
docs/research/RL_FRAMEWORKS_LANDSCAPE.md CHANGED
@@ -313,10 +313,42 @@ group_size = 16
313
 
314
  [trainer]
315
  algorithm = "grpo"
316
- <!-- AUDIT: stale_recipe_format — Wave 14b shipped this as YAML at
317
- composer_replication/recipes/prime_rl/prime_rl_config.yaml with a different
318
- kwarg surface (alpha_sdpo, beta_dpo, dppo_mask_high, dppo_mask_low, adv_tau,
319
- kl_tau). The TOML/hint_weight/replay_weight sketch below predates that. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
320
  [trainer.loss]
321
  type = "custom"
322
  import_path = "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
 
313
 
314
  [trainer]
315
  algorithm = "grpo"
316
+
317
+ > **Realised in v0.1 (Wave 17 update):** Wave 14b shipped the PRIME-RL
318
+ > recipe at `composer_replication/recipes/prime_rl/prime_rl_config.yaml`
319
+ > as **YAML** with a different kwarg surface than the TOML sketch below.
320
+ > The actual recipe shape:
321
+ >
322
+ > ```yaml
323
+ > # composer_replication/recipes/prime_rl/prime_rl_config.yaml
324
+ > model:
325
+ > base: "Qwen/Qwen2.5-0.5B"
326
+ > attn_implementation: "flash_attention_2"
327
+ > dtype: "bfloat16"
328
+ > env:
329
+ > protocol: "verifiers"
330
+ > config: { name: "math/gsm8k", split: "train" }
331
+ > loss:
332
+ > custom:
333
+ > import_path: "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
334
+ > kwargs:
335
+ > alpha_sdpo: 0.0 # channel 2 deferred in v0
336
+ > beta_dpo: 0.0 # channel 3 out-of-scope for PRIME-RL v0
337
+ > dppo_mask_high: 0.2 # PRIME-RL DPPO convention (NOT textbook PPO)
338
+ > dppo_mask_low: 0.2 # both must be >= 0 per Field(..., ge=0)
339
+ > adv_tau: 1.0 # advantage normalization
340
+ > kl_tau: 0.04 # KL coefficient
341
+ > ```
342
+ >
343
+ > The realised `loss_fn(inputs, **kwargs)` matches PRIME-RL's
344
+ > `LossInputs`/`LossOutputs` interface (read upstream `prime_rl/loss.py`
345
+ > for parity verification — Wave 14b's shadow-parity test independently
346
+ > restates the formula in
347
+ > `composer_replication/recipes/prime_rl/tests/test_composer_loss.py`).
348
+ >
349
+ > The pre-Wave-14b TOML/`hint_weight`/`replay_weight` sketch below is
350
+ > preserved as historical proposal context.
351
+
352
  [trainer.loss]
353
  type = "custom"
354
  import_path = "composer_replication.recipes.prime_rl.composer_loss:loss_fn"
docs/research/SELF_DISTILLATION_LANDSCAPE.md CHANGED
@@ -352,13 +352,42 @@ license + reproducible scale) to recommend adding right now.
352
  For ADR-007 the proposed addition is a `composer_replication.distillation`
353
  sub-package with three pluggable hooks:
354
 
355
- <!-- AUDIT: stale_distillation_layout ADR-007 shipped a flatter layout than
356
- this proposal. Actual modules: composer_replication/distillation/{simpo.py,
357
- taid.py, entropy_aware_opd.py}. There is no targets.py/losses.py split,
358
- no top-level preference/ subpackage, and SimPO lives under distillation/
359
- rather than preference/. The function names also differ: actual exports
360
- are `simpo_loss`, `taid_loss` + `TAIDScheduler`, and `entropy_aware_opd_loss`
361
- (not `taid_target` / `entropy_aware_kl_loss`). -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
362
 
363
  ```
364
  composer_replication/
 
352
  For ADR-007 the proposed addition is a `composer_replication.distillation`
353
  sub-package with three pluggable hooks:
354
 
355
+ > **Realised in v0.1 (Wave 17 update):** ADR-007 shipped a flatter
356
+ > layout than the proposal below. Actual exports:
357
+ >
358
+ > ```
359
+ > composer_replication/
360
+ > distillation/
361
+ > __init__.py
362
+ > simpo.py # simpo_loss(chosen_avg_logprobs, rejected_avg_logprobs, *, beta, gamma)
363
+ > # avg_sequence_logprob(logprobs, mask) -- helper
364
+ > taid.py # taid_loss(student_logits, teacher_logits, t, *, ...)
365
+ > # TAIDScheduler -- adaptive momentum schedule per the paper
366
+ > entropy_aware_opd.py # entropy_aware_opd_loss(student_logits, teacher_logits, *, h_max, ...)
367
+ > ```
368
+ >
369
+ > No `targets.py`/`losses.py` split, no top-level `preference/` package,
370
+ > and SimPO lives under `distillation/` rather than `preference/` because
371
+ > the three losses share a common dispatch surface (`compose_loss`'s
372
+ > `dpo_variant` and `sdpo_wrapper` switches).
373
+ >
374
+ > The composition rule realised in `compose_loss` is per-loss flag-driven,
375
+ > not a single composed-function call:
376
+ >
377
+ > ```python
378
+ > compose_loss(model, inputs,
379
+ > dpo_variant="simpo", # OR "dpo" (default)
380
+ > sdpo_wrapper="taid", # OR "entropy_opd" OR "none" (default)
381
+ > taid_t=0.5, # required when sdpo_wrapper="taid"
382
+ > simpo_beta=2.0, simpo_gamma=1.0, # used only when dpo_variant="simpo"
383
+ > entropy_opd_h_max=..., # used only when sdpo_wrapper="entropy_opd"
384
+ > )
385
+ > ```
386
+ >
387
+ > The pre-ADR proposal sketch below is preserved as historical context.
388
+ > The shipped function names are `simpo_loss`, `taid_loss` +
389
+ > `TAIDScheduler`, and `entropy_aware_opd_loss` (not `taid_target` /
390
+ > `entropy_aware_kl_loss`).
391
 
392
  ```
393
  composer_replication/
docs/research/TRACE_SOURCE_RECONNAISSANCE.md CHANGED
@@ -244,12 +244,35 @@ For users on other machines: `find ~/.claude/projects -name '*.jsonl' -size +50k
244
 
245
  ## 6. TraceIngester sketch
246
 
247
- <!-- AUDIT: stale_ingester_paths_and_naming Spike 007 shipped at
248
- spikes/007-real-trace-ingestion/claude_code_ingester.py (NOT
249
- spikes/007-trace-ingester/trace_ingester.py) and the production-side
250
- module is composer_replication/ingestion/claude_code.py exporting
251
- `ClaudeCodeIngester` (NOT `TraceIngester`). The sketch below is the
252
- pre-spike proposal; the realised API surface is named differently. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
253
 
254
  Drop-in adapter for spike-005's `replay_trace()`. Targets `TraceState` (the actual existing TypedDict; see §1).
255
 
 
244
 
245
  ## 6. TraceIngester sketch
246
 
247
+ > **Realised in v0.1 (Wave 17 update):** The realised ingester ships at
248
+ > `composer_replication/ingestion/claude_code.py` exporting
249
+ > `ClaudeCodeIngester`, with the spike at
250
+ > `spikes/007-real-trace-ingestion/claude_code_ingester.py`. The
251
+ > public production surface is:
252
+ >
253
+ > ```python
254
+ > from pathlib import Path
255
+ > from composer_replication.ingestion.claude_code import ClaudeCodeIngester
256
+ >
257
+ > ingester = ClaudeCodeIngester(skip_sidechain=True, strip_thinking=True)
258
+ > for trace_state in ingester.ingest(Path("~/.claude/projects/.../session.jsonl").expanduser()):
259
+ > # trace_state matches the TraceState TypedDict from §1
260
+ > ...
261
+ > stats = ingester.last_stats # IngestionStats — turn counts, skip reasons
262
+ > ```
263
+ >
264
+ > The shipped `ClaudeCodeIngester` differs from the pre-spike sketch
265
+ > below in:
266
+ > - Class name: `ClaudeCodeIngester` (not `TraceIngester`)
267
+ > - Module path: `composer_replication.ingestion.claude_code` (not
268
+ > `spikes/007-trace-ingester/trace_ingester.py`)
269
+ > - The constructor takes config kwargs (`system_prompt`,
270
+ > `skip_sidechain`, `strip_thinking`, `max_history_tokens`); paths
271
+ > are passed to `.ingest(Path)` per call instead of being held by the
272
+ > ingester
273
+ > - The yielded type is `TraceState` (matches §1)
274
+ >
275
+ > The pre-spike sketch below is preserved as historical proposal context.
276
 
277
  Drop-in adapter for spike-005's `replay_trace()`. Targets `TraceState` (the actual existing TypedDict; see §1).
278
 
docs/research/WAVE_16_RECON_AUDIT.md CHANGED
@@ -125,6 +125,16 @@ spike-shape names that do not match what shipped.
125
 
126
  ## Open items for Wave 17+
127
 
 
 
 
 
 
 
 
 
 
 
128
  These are the FLAGged ambiguous claims that need orchestrator decision before
129
  a confident rewrite:
130
 
 
125
 
126
  ## Open items for Wave 17+
127
 
128
+ > **Wave 17 closeout (2026-05-26):** All 5 FLAGs below were resolved
129
+ > by adding "Realised in v0.1" companion sections to the affected
130
+ > proposal docs that document the shipped surface inline above the
131
+ > historical sketch. Wave 17 also fixed the underlying code bug from
132
+ > FLAG #1 — `serverless/__init__.py` now re-exports `ModalExecutor`
133
+ > and `HFJobsExecutor` (verified by
134
+ > `composer_replication/diloco/serverless/tests/test_serverless_local.py::test_public_reexports_include_all_executors`).
135
+ > The original FLAG-list below is preserved as historical record of
136
+ > what the audit found.
137
+
138
  These are the FLAGged ambiguous claims that need orchestrator decision before
139
  a confident rewrite:
140
 
examples/gsm8k_grpo_with_sdpo/README.md CHANGED
@@ -23,23 +23,26 @@ The script will print 5 SGD steps' worth of channel-decomposed losses
23
  and end with three ✓ assertions:
24
 
25
  ```
26
- step 1/5: total=5.9801 lm_ce=5.9087 sdpo_jsd=0.1429 trace_replay_dpo=0.0000 |grad|=6.45e+06
27
- step 2/5: total=4.2268 lm_ce=4.1573 sdpo_jsd=0.1390 trace_replay_dpo=0.0000 |grad|=1.20e+06
28
  ...
29
- step 5/5: total=2.4644 lm_ce=2.3962 sdpo_jsd=0.1363 trace_replay_dpo=0.0000 |grad|=1.03e+06
30
  [4/4] Verifying SDPO column wiring ...
31
- ✓ sdpo_jsd > 0 at every step (min=0.1358, max=0.1429)
32
- ✓ total != lm_ce at every step (min |diff|=0.0679, max=0.0714)
33
- ✓ |grad| > 0 and finite at every step (min=1.01e+06, max=6.45e+06)
34
  ✅ SDPO column wiring verified end-to-end.
35
  ```
36
 
37
- Wall-clock on the reference run: **16.5s** for 5 SGD steps after a
38
- 1.7s model-load phase (no model download — already cached). The SDPO
39
- signal magnitude (~0.14) is meaningful here because the script uses
40
- Qwen's actual ChatML markers (`<|im_start|>` / `<|im_end|>`) via
41
- `tokenizer.apply_chat_template` not raw marker strings, which would
42
- be tokenized as 11 punctuation tokens and the model would see nonsense.
 
 
 
43
 
44
  If `sdpo_jsd` ever shows up as `0.0000`, the SDPO column is silent —
45
  that means either (a) `alpha_sdpo=0`, (b) `ctx_teacher_input_ids` is
 
23
  and end with three ✓ assertions:
24
 
25
  ```
26
+ step 1/5: total=2.2215 lm_ce=2.1898 sdpo_jsd=0.0634 trace_replay_dpo=0.0000 |grad|=1.38e+06
27
+ step 2/5: total=1.7695 lm_ce=1.7374 sdpo_jsd=0.0642 trace_replay_dpo=0.0000 |grad|=1.12e+06
28
  ...
29
+ step 5/5: total=1.2781 lm_ce=1.2465 sdpo_jsd=0.0631 trace_replay_dpo=0.0000 |grad|=8.24e+05
30
  [4/4] Verifying SDPO column wiring ...
31
+ ✓ sdpo_jsd > 0 at every step (min=0.0611, max=0.0642)
32
+ ✓ total != lm_ce at every step (min |diff|=0.0306, max=0.0321)
33
+ ✓ |grad| > 0 and finite at every step (min=8.24e+05, max=1.38e+06)
34
  ✅ SDPO column wiring verified end-to-end.
35
  ```
36
 
37
+ Wall-clock on the reference run: **21.3s** for 5 SGD steps after a
38
+ 1.7s model-load phase (no model download — already cached). The script
39
+ left-pads and left-truncates the chat-template'd input so student and
40
+ teacher contexts are bit-identical on the right-most 16 positions —
41
+ the same alignment discipline production SDPO requires (see `build_inputs`
42
+ docstring for the alignment rationale and the link to
43
+ `composer_replication/trainer/data_collator.py`). Without left-truncation
44
+ the assistant marker gets dropped and the SDPO mask covers prompt-area
45
+ tokens instead, inflating the channel signal on misaligned positions.
46
 
47
  If `sdpo_jsd` ever shows up as `0.0000`, the SDPO column is silent —
48
  that means either (a) `alpha_sdpo=0`, (b) `ctx_teacher_input_ids` is
examples/gsm8k_grpo_with_sdpo/run.py CHANGED
@@ -106,15 +106,36 @@ def build_inputs(tokenizer) -> dict[str, torch.Tensor]:
106
  """Tokenize PROBLEMS into a compose_loss-shaped batch.
107
 
108
  Returns a dict with:
109
- - input_ids: (B, T) student rollouts (no hint)
110
- - response_mask: (B, T)
111
- - ctx_teacher_input_ids: (B, T) hint-conditioned context
112
- - sdpo_loss_mask: (B, T) 1 at assistant-response tokens
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
113
  """
 
 
 
 
 
 
 
114
  student_msg_lists = [_build_chat_messages(p["question"], with_hint=False) for p in PROBLEMS[:B]]
115
  teacher_msg_lists = [_build_chat_messages(p["question"], with_hint=True) for p in PROBLEMS[:B]]
116
 
117
- # Render via Qwen's chat template — produces real <|im_start|>/<|im_end|> tokens.
118
  student_strs = [
119
  tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True)
120
  for m in student_msg_lists
@@ -124,26 +145,40 @@ def build_inputs(tokenizer) -> dict[str, torch.Tensor]:
124
  for m in teacher_msg_lists
125
  ]
126
 
127
- s_tok = tokenizer(
128
- student_strs,
129
- max_length=T,
130
- truncation=True,
131
- padding="max_length",
132
- return_tensors="pt",
133
- )
134
- t_tok = tokenizer(
135
- teacher_strs,
136
- max_length=T,
137
- truncation=True,
138
- padding="max_length",
139
- return_tensors="pt",
140
- )
141
-
142
- # Mark the second half of each sequence as the "response" — purely
143
- # synthetic for this smoke; in real training the response_mask comes
144
- # from the rollout pipeline.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  response_mask = torch.zeros(B, T, dtype=torch.long)
146
- response_mask[:, T // 2:] = 1
147
  sdpo_loss_mask = response_mask.clone()
148
 
149
  return {
 
106
  """Tokenize PROBLEMS into a compose_loss-shaped batch.
107
 
108
  Returns a dict with:
109
+ - input_ids: (B, T) student rollouts (no hint), left-padded
110
+ - response_mask: (B, T) 1 on the assistant-response area
111
+ - ctx_teacher_input_ids: (B, T) hint-conditioned context, left-padded
112
+ - sdpo_loss_mask: (B, T) 1 at the aligned post-prompt area
113
+
114
+ SDPO requires student and teacher logits to align position-by-position
115
+ over the loss mask. The student and teacher prompts have different
116
+ prefix lengths (teacher is longer because of the inserted hint
117
+ system turn), so we LEFT-pad both to T tokens — the right edge (the
118
+ assistant generation marker) lines up across the batch and across
119
+ student vs teacher. The SDPO mask covers the right-most ALIGN_LEN
120
+ positions, all of which correspond to identical "post-prompt /
121
+ assistant-response area" tokens in both contexts.
122
+
123
+ This matches the alignment discipline the production
124
+ `ComposerDataCollator` (composer_replication/trainer/data_collator.py)
125
+ must enforce: the post-hint section must have identical token
126
+ positions in student vs teacher, or `_compute_sdpo_loss` will
127
+ detect a shape mismatch and skip the channel for that step.
128
  """
129
+ # ALIGN_LEN: how many right-most positions to use for the SDPO loss.
130
+ # These positions correspond to the assistant-generation area, which
131
+ # is identical (token-for-token) across student and teacher because
132
+ # apply_chat_template appends the same `<|im_start|>assistant\n`
133
+ # marker regardless of how many system turns came before.
134
+ ALIGN_LEN = T // 2 # 16 of 32; same as response_mask back-half
135
+
136
  student_msg_lists = [_build_chat_messages(p["question"], with_hint=False) for p in PROBLEMS[:B]]
137
  teacher_msg_lists = [_build_chat_messages(p["question"], with_hint=True) for p in PROBLEMS[:B]]
138
 
 
139
  student_strs = [
140
  tokenizer.apply_chat_template(m, tokenize=False, add_generation_prompt=True)
141
  for m in student_msg_lists
 
145
  for m in teacher_msg_lists
146
  ]
147
 
148
+ # LEFT-pad AND LEFT-truncate to T: temporarily flip both the
149
+ # tokenizer's padding side and truncation side. This ensures the
150
+ # right edge (the assistant generation marker) is preserved at
151
+ # position T-1 regardless of whether the input is shorter than T
152
+ # (gets left-padded) or longer than T (gets left-truncated, dropping
153
+ # the leading system turns first). Without this, the default
154
+ # right-truncation discards the assistant marker — which means the
155
+ # SDPO mask covers tokens from the system prompt instead of the
156
+ # assistant response area, and the channel computes JSD over
157
+ # nonsense alignment.
158
+ original_pad = tokenizer.padding_side
159
+ original_trunc = tokenizer.truncation_side
160
+ tokenizer.padding_side = "left"
161
+ tokenizer.truncation_side = "left"
162
+ try:
163
+ s_tok = tokenizer(
164
+ student_strs, max_length=T, truncation=True,
165
+ padding="max_length", return_tensors="pt",
166
+ )
167
+ t_tok = tokenizer(
168
+ teacher_strs, max_length=T, truncation=True,
169
+ padding="max_length", return_tensors="pt",
170
+ )
171
+ finally:
172
+ tokenizer.padding_side = original_pad
173
+ tokenizer.truncation_side = original_trunc
174
+
175
+ # response_mask: 1 on the right-most ALIGN_LEN tokens, 0 elsewhere
176
+ # (left padding + prompt area). For both student and teacher these
177
+ # positions cover the assistant-generation marker + any padding
178
+ # that happens to fall there. Same indices apply to both because
179
+ # of left-padding alignment.
180
  response_mask = torch.zeros(B, T, dtype=torch.long)
181
+ response_mask[:, -ALIGN_LEN:] = 1
182
  sdpo_loss_mask = response_mask.clone()
183
 
184
  return {