Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Wave 13 Adversarial Cross-Model Review
Reviewer: Claude Opus 4.7 (sub-agent via delegate_task) Date: 2026-05-26 Scope: Wave 13 additions only (35 new tests, 4 ADRs, 6 new modules) Method: Read-and-grep audit + targeted test runs (CPU)
Top-line verdict
CONDITIONAL PASS with two BLOCKERs. Wave 13 substantially advances the brief expansion (serverless DiLoCo abstraction, replaysim normalization, three distillation losses, PRIME-RL recipe, Monarch tie-in). The distillation losses are the strongest deliverable — real, well-tested, mathematically faithful to the cited papers. The serverless-DiLoCo local executor + ObjectStoreAllReduce barrier are also genuine and exercised by 3 real multi-process tests.
However, two material claims are not test-validated, and one new module silently produces a degenerate loss in its primary code path. ADR claims that say "X is added to compose_loss" describe code that wasn't actually written. The MockManager → DiLoCo "drop-in" is unverified end-to-end.
Wave 11's reviewer found 2 genuine BLOCKERs. This review finds 2 BLOCKERs + 4 SUGGESTIONs + 2 NITs.
Finding 1 — BLOCKER: PRIME-RL composer_loss.loss_fn SDPO term is mathematically degenerate (always 0)
Severity: BLOCKER
Evidence: composer_replication/recipes/prime_rl/composer_loss.py:79-86
The PRIME-RL composer-loss adapter applies unsqueeze(-1) to (B, T)
log-prob tensors before passing them to generalized_jsd_loss, which
calls F.log_softmax(..., dim=-1). Softmax of a single-element vector
is exactly 1.0; its log is 0. Therefore both student_log_probs and
teacher_log_probs are identically zero, the JSD between them is 0,
and the SDPO contribution is always 0 regardless of alpha_sdpo or
the actual log-prob values.
>>> import torch.nn.functional as F
>>> F.log_softmax(torch.randn(2, 3, 1), dim=-1)
tensor([[[0.],[0.],[0.]],[[0.],[0.],[0.]]])
The docstring calls this "a deliberate approximation," but it is not an approximation — it's a mathematically degenerate operation that silently disables channel 2.
Fix direction:
- Gate the SDPO branch behind
len(trainer_lp.shape) >= 3, raisingNotImplementedErroruntil PRIME-RL surfaces full logits. - Update
prime_rl_recipe.mdand ADR-006 to stop claiming PRIME-RL has working SDPO; mark it deferred.
Finding 2 — BLOCKER: ADR-007 declares compose_loss kwargs that were never added
Severity: BLOCKER Evidence:
docs/adrs/ADR-007-self-distillation-losses.md:103-108claims:composer_replication.compose_lossgets new optional kwargs:dpo_variant: Literal["dpo", "simpo"] = "dpo"— switches channel 3sdpo_wrapper: Literal["none", "taid", "entropy_opd"] = "none"— wraps channel 2taid_schedule_step: int | None = Nonetaid_total_steps: int | None = None
composer_replication/loss.py:54-65actual signature has none of these.grep -n "dpo_variant\|sdpo_wrapper\|taid" composer_replication/loss.pyreturns empty.
The new losses live in composer_replication.distillation as
standalone functions but are not wired into the framework's actual
loss composition. A user reading ADR-007 + the README would believe
compose_loss(model, inputs, dpo_variant="simpo", sdpo_wrapper="taid", ...)
works; it would raise TypeError. The 17 distillation tests verify
the standalone losses but never exercise integration.
Fix direction:
- Either (a) add the kwargs to
compose_lossand write at least one integration test combining e.g. SDPO+TAID (~30 LOC change), or - (b) downgrade ADR-007 status to "Standalone losses landed; integration deferred to Wave 14."
Finding 3 — SUGGESTION: default.yaml replaysim recipe uses string ops on list-of-dict fields
Severity: SUGGESTION (would be BLOCKER if a test exercised the real path) Evidence:
composer_replication/recipes/replaysim/default.yamlconfigurestext_length_filter,words_num_filter,special_characters_filter,document_deduplicatorwithtext_keys: ["chosen", "rejected"].- In the record produced by
_dpo_pair_to_dj_record,chosenandrejectedare lists of dicts ([{"role": "assistant", "content": "..."}]) — not strings. - data-juicer's
text_length_filterexpects string-typed fields; running it on a list will either crash or no-op silently.
The reason no test catches this: tests only validate the real path if
data-juicer is installed, and even then only check __init__ succeeds.
There is no test that calls normalize() against a real data-juicer
executor with the default recipe.
Fix direction:
- Reshape
_dpo_pair_to_dj_recordto extractcontentstrings alongside the messages-format list. - Add one test (skip-marked unless
data_juiceris importable) that runs the real op-graph on 3 hand-crafted records.
Finding 4 — SUGGESTION: MockManager → torchft.DiLoCo "drop-in" claim is unverified end-to-end
Severity: SUGGESTION Evidence:
composer_replication/diloco/serverless/allreduce.py:188-191claims MockManager "drops into"make_diloco_outer_loop.- The only test covering MockManager (
test_mock_manager_shape_compat) is ahasattrsmoke that calls.allreduceon aworld_size=1store (passthrough). - torchft.Manager has additional surface area
(
current_step,is_leader,_pg,report_error, internal step accounting) that DiLoCo's_apply_pseudogradientmay consult depending on version.
Fix direction:
- Add a single integration test that constructs
make_diloco_outer_loop(manager=MockManager(store), ...)against a tinynn.Linearand runs one outer round — even single-process. - Audit
torchft/local_sgd.pyfor theManager-rooted call sites and add stubs for any methods DiLoCo actually consults beyondallreduce.
Finding 5 — SUGGESTION: README claim "9 multi-process tests" is mildly inflated
Severity: SUGGESTION (NIT bordering) Evidence:
- README.md and V1_V8_COVERAGE both state: "9 multi-process tests pinning the allreduce barrier."
- Actual breakdown:
- 4 single-process unit tests +
test_mock_manager_shape_compat(5) - 4 multi-process tests spawning subprocesses (parametrized [2,3] of
_runs_allreduce_across_replicas,_handles_multiple_rounds,_reports_failed_replicas)
- 4 single-process unit tests +
- Of the 4 multi-process tests, only 3 actually exercise the
allreduce barrier;
_reports_failed_replicasdeliberately raises before any allreduce call.
Wave 13 clearly does NOT fake-pass via world_size=1 — the multi- process barrier is real. But the count is rounded up.
Fix direction: Replace "9 multi-process tests" with "9 tests covering the serverless DiLoCo layer, of which 4 spawn real subprocesses and 3 exercise the allreduce barrier across replicas."
Finding 6 — SUGGESTION: PRIME-RL channel 1 is REINFORCE not GRPO; ignores inference_logprobs
Severity: SUGGESTION
Evidence: composer_replication/recipes/prime_rl/composer_loss.py:62-68
computes:
grpo_loss = -(advantages * trainer_lp * mask).sum() / mask.sum().clamp_min(epsilon)
This is plain REINFORCE with advantage. PRIME-RL's LossInputs
exposes inference_logprobs precisely because GRPO-with-replay-buffer
requires the importance-sampling ratio
exp(trainer_lp - inference_lp) (PPO-style clipped objective).
The file says "SKELETON" so this isn't a hidden bug per se, but the loss is labeled GRPO and is not GRPO.
Fix direction: Either implement the ratio + clipping (~20 LOC) or rename channel-1 comment to "REINFORCE-with-advantage stub" with a TODO.
Finding 7 — NIT: ModalExecutor / HFJobsExecutor are skeleton-only with NotImplementedError in __init__
Severity: NIT (this is documented, but README phrasing is slightly soft)
Evidence: Honestly documented as skeletons in the code, ADR-005,
and README. NIT: a user trying ModalExecutor() gets a runtime error
rather than an import-time clue.
Fix direction: Low priority. Update README phrase to "skeleton-only
— raises NotImplementedError until v0.x." Or use a __getattr__ on
the package that raises a clearer message.
Finding 8 — NIT: SimPO test uses positive log-probs (impossible values)
Severity: NIT
Evidence: test_distillation_losses.py:27-46 calls simpo_loss
with chosen=tensor([0.5, 0.4, 0.3]). Log-probabilities are bounded
above by 0; positive values aren't possible from any softmax. The tests
still verify the formula correctly, but the test inputs aren't legal.
Fix direction: Use negative values — purely cosmetic.
Cross-cutting risk check
73 tests passed in 29.29s on the CPU-fast subset. Spike 008 5/5 still
pass. The new composer_replication.diloco.serverless package is
purely additive; the existing make_diloco_outer_loop is untouched.
No cross-wave regressions detected on CPU. GPU tests + slow CPU
e2e tests not re-run; regression risk low since Wave 13 doesn't touch
their dependencies.
Summary scorecard
| Item | Verdict |
|---|---|
| Distillation module (SimPO/TAID/Entropy-Aware OPD) standalone | ✅ Real, well-tested, paper-faithful |
Distillation integrated into compose_loss |
❌ Not implemented despite ADR-007 (Finding 2) |
| ObjectStoreAllReduce + LocalProcessExecutor | ✅ Real multi-process barrier validated |
| MockManager → DiLoCo drop-in | 🟡 Shape-checked only; integration unverified (Finding 4) |
| Modal/HFJobs adapters | 🟡 Honestly documented as skeletons (Finding 7) |
| Replaysim DJNormalizer passthrough | ✅ Works |
| Replaysim default.yaml against real data-juicer | ❌ Recipe field types don't match record shape (Finding 3) |
| PRIME-RL composer_loss.loss_fn | ❌ SDPO term silently 0 (Finding 1); channel 1 is REINFORCE not GRPO (Finding 6) |
| Monarch actors | ✅ Honest skeleton; raises NotImplementedError |
| Altered-minds tie-in doc | ✅ Design-only, scoped honestly |
| 35 new tests | All pass; 3 of 4 multi-process tests are genuine (Finding 5) |
Recommendation: Address Findings 1 and 2 before publishing the Wave 13 expansion as "closed." Findings 3 and 4 should be addressed before any user attempts the real data-juicer or real torchft DiLoCo path. Findings 5–8 are cleanup.