# Wave 13 Adversarial Cross-Model Review **Reviewer:** Claude Opus 4.7 (sub-agent via delegate_task) **Date:** 2026-05-26 **Scope:** Wave 13 additions only (35 new tests, 4 ADRs, 6 new modules) **Method:** Read-and-grep audit + targeted test runs (CPU) ## Top-line verdict **CONDITIONAL PASS with two BLOCKERs.** Wave 13 substantially advances the brief expansion (serverless DiLoCo abstraction, replaysim normalization, three distillation losses, PRIME-RL recipe, Monarch tie-in). The **distillation losses are the strongest deliverable** — real, well-tested, mathematically faithful to the cited papers. The serverless-DiLoCo local executor + ObjectStoreAllReduce barrier are also genuine and exercised by 3 real multi-process tests. **However, two material claims are not test-validated, and one new module silently produces a degenerate loss in its primary code path.** ADR claims that say "X is added to compose_loss" describe code that wasn't actually written. The MockManager → DiLoCo "drop-in" is unverified end-to-end. Wave 11's reviewer found 2 genuine BLOCKERs. This review finds **2 BLOCKERs + 4 SUGGESTIONs + 2 NITs**. --- ## Finding 1 — BLOCKER: PRIME-RL `composer_loss.loss_fn` SDPO term is mathematically degenerate (always 0) **Severity:** BLOCKER **Evidence:** `composer_replication/recipes/prime_rl/composer_loss.py:79-86` The PRIME-RL composer-loss adapter applies `unsqueeze(-1)` to `(B, T)` log-prob tensors before passing them to `generalized_jsd_loss`, which calls `F.log_softmax(..., dim=-1)`. Softmax of a single-element vector is exactly 1.0; its log is 0. Therefore both `student_log_probs` and `teacher_log_probs` are identically zero, the JSD between them is 0, and the SDPO contribution **is always 0 regardless of `alpha_sdpo` or the actual log-prob values.** ```python >>> import torch.nn.functional as F >>> F.log_softmax(torch.randn(2, 3, 1), dim=-1) tensor([[[0.],[0.],[0.]],[[0.],[0.],[0.]]]) ``` The docstring calls this "a deliberate approximation," but it is not an approximation — it's a mathematically degenerate operation that silently disables channel 2. **Fix direction:** - Gate the SDPO branch behind `len(trainer_lp.shape) >= 3`, raising `NotImplementedError` until PRIME-RL surfaces full logits. - Update `prime_rl_recipe.md` and ADR-006 to stop claiming PRIME-RL has working SDPO; mark it deferred. --- ## Finding 2 — BLOCKER: ADR-007 declares `compose_loss` kwargs that were never added **Severity:** BLOCKER **Evidence:** - `docs/adrs/ADR-007-self-distillation-losses.md:103-108` claims: > `composer_replication.compose_loss` gets new optional kwargs: > - `dpo_variant: Literal["dpo", "simpo"] = "dpo"` — switches channel 3 > - `sdpo_wrapper: Literal["none", "taid", "entropy_opd"] = "none"` — wraps channel 2 > - `taid_schedule_step: int | None = None` > - `taid_total_steps: int | None = None` - `composer_replication/loss.py:54-65` actual signature has **none** of these. `grep -n "dpo_variant\|sdpo_wrapper\|taid" composer_replication/loss.py` returns empty. The new losses live in `composer_replication.distillation` as standalone functions but **are not wired into the framework's actual loss composition.** A user reading ADR-007 + the README would believe `compose_loss(model, inputs, dpo_variant="simpo", sdpo_wrapper="taid", ...)` works; it would raise `TypeError`. The 17 distillation tests verify the standalone losses but never exercise integration. **Fix direction:** - Either (a) add the kwargs to `compose_loss` and write at least one integration test combining e.g. SDPO+TAID (~30 LOC change), or - (b) downgrade ADR-007 status to "Standalone losses landed; integration deferred to Wave 14." --- ## Finding 3 — SUGGESTION: `default.yaml` replaysim recipe uses string ops on list-of-dict fields **Severity:** SUGGESTION (would be BLOCKER if a test exercised the real path) **Evidence:** - `composer_replication/recipes/replaysim/default.yaml` configures `text_length_filter`, `words_num_filter`, `special_characters_filter`, `document_deduplicator` with `text_keys: ["chosen", "rejected"]`. - In the record produced by `_dpo_pair_to_dj_record`, `chosen` and `rejected` are **lists of dicts** (`[{"role": "assistant", "content": "..."}]`) — not strings. - data-juicer's `text_length_filter` expects string-typed fields; running it on a list will either crash or no-op silently. The reason no test catches this: tests only validate the real path *if data-juicer is installed*, and even then only check `__init__` succeeds. There is no test that calls `normalize()` against a real data-juicer executor with the default recipe. **Fix direction:** - Reshape `_dpo_pair_to_dj_record` to extract `content` strings alongside the messages-format list. - Add one test (skip-marked unless `data_juicer` is importable) that runs the real op-graph on 3 hand-crafted records. --- ## Finding 4 — SUGGESTION: MockManager → torchft.DiLoCo "drop-in" claim is unverified end-to-end **Severity:** SUGGESTION **Evidence:** - `composer_replication/diloco/serverless/allreduce.py:188-191` claims MockManager "drops into" `make_diloco_outer_loop`. - The only test covering MockManager (`test_mock_manager_shape_compat`) is a `hasattr` smoke that calls `.allreduce` on a `world_size=1` store (passthrough). - torchft.Manager has additional surface area (`current_step`, `is_leader`, `_pg`, `report_error`, internal step accounting) that DiLoCo's `_apply_pseudogradient` may consult depending on version. **Fix direction:** - Add a single integration test that constructs `make_diloco_outer_loop(manager=MockManager(store), ...)` against a tiny `nn.Linear` and runs one outer round — even single-process. - Audit `torchft/local_sgd.py` for the `Manager`-rooted call sites and add stubs for any methods DiLoCo actually consults beyond `allreduce`. --- ## Finding 5 — SUGGESTION: README claim "9 multi-process tests" is mildly inflated **Severity:** SUGGESTION (NIT bordering) **Evidence:** - README.md and V1_V8_COVERAGE both state: *"9 multi-process tests pinning the allreduce barrier."* - Actual breakdown: - 4 single-process unit tests + `test_mock_manager_shape_compat` (5) - 4 multi-process tests spawning subprocesses (parametrized [2,3] of `_runs_allreduce_across_replicas`, `_handles_multiple_rounds`, `_reports_failed_replicas`) - Of the 4 multi-process tests, only **3 actually exercise the allreduce barrier**; `_reports_failed_replicas` deliberately raises before any allreduce call. **Wave 13 clearly does NOT fake-pass via world_size=1** — the multi- process barrier is real. But the count is rounded up. **Fix direction:** Replace "9 multi-process tests" with "9 tests covering the serverless DiLoCo layer, of which 4 spawn real subprocesses and 3 exercise the allreduce barrier across replicas." --- ## Finding 6 — SUGGESTION: PRIME-RL channel 1 is REINFORCE not GRPO; ignores `inference_logprobs` **Severity:** SUGGESTION **Evidence:** `composer_replication/recipes/prime_rl/composer_loss.py:62-68` computes: ```python grpo_loss = -(advantages * trainer_lp * mask).sum() / mask.sum().clamp_min(epsilon) ``` This is plain REINFORCE with advantage. PRIME-RL's `LossInputs` exposes `inference_logprobs` precisely because GRPO-with-replay-buffer requires the importance-sampling ratio `exp(trainer_lp - inference_lp)` (PPO-style clipped objective). The file says "SKELETON" so this isn't a hidden bug per se, but the loss is **labeled GRPO and is not GRPO**. **Fix direction:** Either implement the ratio + clipping (~20 LOC) or rename channel-1 comment to "REINFORCE-with-advantage stub" with a TODO. --- ## Finding 7 — NIT: ModalExecutor / HFJobsExecutor are skeleton-only with `NotImplementedError` in `__init__` **Severity:** NIT (this is documented, but README phrasing is slightly soft) **Evidence:** Honestly documented as skeletons in the code, ADR-005, and README. NIT: a user trying `ModalExecutor()` gets a runtime error rather than an import-time clue. **Fix direction:** Low priority. Update README phrase to "skeleton-only — raises NotImplementedError until v0.x." Or use a `__getattr__` on the package that raises a clearer message. --- ## Finding 8 — NIT: SimPO test uses positive log-probs (impossible values) **Severity:** NIT **Evidence:** `test_distillation_losses.py:27-46` calls `simpo_loss` with `chosen=tensor([0.5, 0.4, 0.3])`. Log-probabilities are bounded above by 0; positive values aren't possible from any softmax. The tests still verify the formula correctly, but the test inputs aren't legal. **Fix direction:** Use negative values — purely cosmetic. --- ## Cross-cutting risk check 73 tests passed in 29.29s on the CPU-fast subset. Spike 008 5/5 still pass. The new `composer_replication.diloco.serverless` package is purely additive; the existing `make_diloco_outer_loop` is untouched. **No cross-wave regressions detected on CPU.** GPU tests + slow CPU e2e tests not re-run; regression risk low since Wave 13 doesn't touch their dependencies. --- ## Summary scorecard | Item | Verdict | |---|---| | Distillation module (SimPO/TAID/Entropy-Aware OPD) standalone | ✅ Real, well-tested, paper-faithful | | Distillation integrated into `compose_loss` | ❌ **Not implemented** despite ADR-007 (Finding 2) | | ObjectStoreAllReduce + LocalProcessExecutor | ✅ Real multi-process barrier validated | | MockManager → DiLoCo drop-in | 🟡 Shape-checked only; integration unverified (Finding 4) | | Modal/HFJobs adapters | 🟡 Honestly documented as skeletons (Finding 7) | | Replaysim DJNormalizer passthrough | ✅ Works | | Replaysim default.yaml against real data-juicer | ❌ **Recipe field types don't match record shape** (Finding 3) | | PRIME-RL composer_loss.loss_fn | ❌ **SDPO term silently 0** (Finding 1); channel 1 is REINFORCE not GRPO (Finding 6) | | Monarch actors | ✅ Honest skeleton; raises NotImplementedError | | Altered-minds tie-in doc | ✅ Design-only, scoped honestly | | 35 new tests | All pass; 3 of 4 multi-process tests are genuine (Finding 5) | **Recommendation:** Address Findings 1 and 2 before publishing the Wave 13 expansion as "closed." Findings 3 and 4 should be addressed before any user attempts the real data-juicer or real torchft DiLoCo path. Findings 5–8 are cleanup.