docs(adr-013): close acceptance-gate boxes 1-5; only user-gated spend remains

The LMA-side runner scaffold (the one outstanding non-budget-gated box) now
exists in the sister repo: llm-mental-alterations/composer_replication_runs/
with 15 green mock-driven tests (collect+pass with no torch/trl/modal/framework
installed), documented in LMA ADR-0017. Marks boxes 1-5 done; the only open
item is the explicit user go/no-go on real LMA-checkpoint / Modal budget spend.

Files changed (1) hide show

docs/adrs/ADR-013-lma-integration-channel-ladder.md +13 -7

docs/adrs/ADR-013-lma-integration-channel-ladder.md CHANGED Viewed

@@ -90,21 +90,27 @@ experiment design.** Specifically:
 ## Acceptance gate
-- [ ] `MMLUFormatReward` implemented + tested: correct→+1, wrong→0,
   unparseable→−0.2, multiple-answers→−0.1, length-penalty; a "always C" /
   option-prior exploit is detectable via logged option distribution. Rationale
   style is NOT scored.
-- [ ] `dual_kl_logger` logs both KLs; unit test on a toy policy/ref pair asserts
   KL(p‖p)==0 and KL increases as the policy moves.
-- [ ] `channel_ladder_configs()` returns A0–A4 with the documented α/β/kl_beta;
   unit test asserts A1 has both channels off, A2 SDPO-only, A3 replay-only.
-- [ ] LMA runner scaffold exists with mock-driven unit tests (no real model load,
   no Modal, no budget spend) proving the wiring: altered-ckpt → collator →
-  ComposerReplicationTrainer(A2 config) → reward_fn → step.
-- [ ] `docs/ALTERED_MINDS_TIE_IN.md` updated: Phase-3 hyperparameters replaced by
   a pointer to this ADR's ladder; the amplification-risk finding documented.
 - [ ] **Out of scope (user-gated):** any real LMA checkpoint load or Modal/budget
-  spend. Documented as the explicit go-decision.
 ## More Information

 ## Acceptance gate
+- [x] `MMLUFormatReward` implemented + tested: correct→+1, wrong→0,
   unparseable→−0.2, multiple-answers→−0.1, length-penalty; a "always C" /
   option-prior exploit is detectable via logged option distribution. Rationale
   style is NOT scored.
+- [x] `dual_kl_logger` logs both KLs; unit test on a toy policy/ref pair asserts
   KL(p‖p)==0 and KL increases as the policy moves.
+- [x] `channel_ladder_configs()` returns A0–A4 with the documented α/β/kl_beta;
   unit test asserts A1 has both channels off, A2 SDPO-only, A3 replay-only.
+- [x] LMA runner scaffold exists with mock-driven unit tests (no real model load,
   no Modal, no budget spend) proving the wiring: altered-ckpt → collator →
+  ComposerReplicationTrainer(A2 config) → reward_fn → step. **Shipped in the LMA
+  repo (NOT this one), 2026-05-29:** `llm-mental-alterations/composer_replication_runs/`
+  (`moral_scenarios_replay.py`, `train_grpo.py`, `eval_post_rl.py`,
+  `tests/test_runner_wiring.py` — 15 mock tests green, collect+pass with no
+  torch/trl/modal/framework installed). Documented in LMA ADR-0017.
+- [x] `docs/ALTERED_MINDS_TIE_IN.md` updated: Phase-3 hyperparameters replaced by
   a pointer to this ADR's ladder; the amplification-risk finding documented.
 - [ ] **Out of scope (user-gated):** any real LMA checkpoint load or Modal/budget
+  spend. Documented as the explicit go-decision. ← **the single remaining gate;
+  this is the go/no-go the user holds.** All capability is built + tested up to
+  this line.
 ## More Information