feat(wave-b): ADR-013 LMA integration + B4 end-to-end SDPO-fires proof + doc refresh

ADR-013: composer_replication/integrations/altered_minds/ (generic, framework-side)
- MMLUFormatReward: structured-answer reward, scores only the final letter (not
rationale style), unparseable/multiple-answer penalties, option-order randomization,
always-C exploit detectable via logged option distribution.
- dual_kl_logger: KL(policy||altered-init) AND KL(policy||unaltered-base) as the
washout-vs-amplification instrument (neither optimized).
- channel_ladder_configs: A0-A4 isolated-channel ladder (alpha=0.02/beta=0.05),
replacing the uninterpretable combined alpha=0.2/beta=0.4 recipe per the
cross-family research critique (SDPO can AMPLIFY an altered model).

B4: examples/altered_minds_channel_ladder/run.py PROVES the SDPO channel FIRES
nonzero (JSD=0.0565, grad flows) through the REAL shipped collator alignment
indices — the thing the old smoke could not show (it only proved init). Honest
stub-with-differing-tokens proof; real-model path gated behind ALTERED_MINDS_REAL_MODEL=1.

ALTERED_MINDS_TIE_IN.md Phase-3 hyperparams superseded -> ADR-013 ladder.
BACKLOG.md + VISION_VALIDATION.md refreshed to actual shipped state.

227 passed / 16 skipped (was 210). NO real LMA checkpoint / Modal / budget spend
(user-gated). Workers: 2x Opus-4.8 + Gemini-3.1-pro (docs).

Files changed (12) hide show

BACKLOG.md +26 -19
composer_replication/integrations/__init__.py +0 -0
composer_replication/integrations/altered_minds/__init__.py +48 -0
composer_replication/integrations/altered_minds/kl_logging.py +110 -0
composer_replication/integrations/altered_minds/ladder.py +97 -0
composer_replication/integrations/altered_minds/reward.py +231 -0
composer_replication/integrations/altered_minds/tests/__init__.py +0 -0
composer_replication/integrations/altered_minds/tests/test_channel_ladder.py +362 -0
docs/ALTERED_MINDS_TIE_IN.md +37 -10
docs/VISION_VALIDATION.md +7 -0
examples/altered_minds_channel_ladder/README.md +52 -0
examples/altered_minds_channel_ladder/run.py +229 -0

BACKLOG.md CHANGED Viewed

@@ -1,8 +1,23 @@
 # Backlog — Composer 2.5 Replication Framework
-Imported from `docs/VISION_VALIDATION.md` § 6 (gaps) + § 9 (gap-closers) at 2026-05-26.
-## Active items (CPU-only, no GPU budget)
 ### Spike 006 — Real HF model smoke (Wave 7)
@@ -60,23 +75,15 @@ Imported from `docs/VISION_VALIDATION.md` § 6 (gaps) + § 9 (gap-closers) at 20
 4. README quickstart updated to `pip install -e .` + `python examples/qwen3_05b_quickstart/run.py`.
 5. `pip install -e .` succeeds and quickstart runs end-to-end on CPU.
-**Estimate**: half a day, CPU.
-## Modal-gated (if budget allows after gap-closers)
-### Spike 002a-mini — Real GPU smoke (Phase 10)
-**Closes**: the "did we ever run gradients on GPU" ambiguity — currently everything is CPU-only.
-**Goal**: dispatch a 30-min A10G smoke on Modal that runs Spike 006 unchanged on GPU, verifies bf16 numerics, captures memory + step-time.
-**Acceptance**:
-1. ADR-001 says Modal is the right choice for this workload + estimate is < $5.
-2. Modal app builds, runs `composer_total_loss` for 50 steps on Qwen2.5-0.5B-Instruct.
-3. Loss curve + memory profile saved to `spikes/002a-mini/` and pulled to local.
-4. No new shape / dtype bug surfaced vs CPU run.
-**Estimate**: $1–3, 30 min wall-clock.
 ## Deferred (post-loop, GPU-gated)

 # Backlog — Composer 2.5 Replication Framework
+Updated 2026-05-29 to reflect shipped waves (ingestion, diloco, packaging, datagen+RL, ADR-011/012/013, cross-family review).
+## Active items / Honest Gaps
+### Framework/Docker substrate E2E (Hardware-blocked)
+- We lack the local multi-node GPU environment to run the true 8-node DiLoCo + Docker/TorchForge orchestrator E2E tests. Currently isolated to unit-level and single-node pseudo-gradient checks.
+### Real 8B LMA run (User-budget-gated)
+- The framework is proven on Qwen-0.5B and 1.5B (GSM8K/SDPO math traces).
+- The ultimate goal (Llama-3-8B full LMA run with α/β ablation over 10k SWE-bench traces) requires a multi-GPU Modal drop + significant compute budget.
+## Modal-gated (if budget allows after gap-closers)
+### Spike 002a-mini — Real GPU smoke (Phase 10)
+**Closes**: the "did we ever run gradients on GPU" ambiguity — currently everything is CPU-only.
+- Goal: dispatch a 30-min A10G smoke on Modal that runs Qwen2.5-0.5B-Instruct natively on GPU.
+## Shipped (Past-Skeleton)
 ### Spike 006 — Real HF model smoke (Wave 7)
 4. README quickstart updated to `pip install -e .` + `python examples/qwen3_05b_quickstart/run.py`.
 5. `pip install -e .` succeeds and quickstart runs end-to-end on CPU.
+### Post-Skeleton Waves (Datagen, Alignment, Quality)
+- **Trace Ingestion**: Shipped (`composer_replication/ingestion/`).
+- **DiLoCo**: Shipped (`composer_replication/diloco/` outer-loop pseudo optimizer).
+- **Packaging**: Shipped (`pip install -e .` works perfectly).
+- **ADR-008/009/010 (Datagen, Layered Hints, Dr.GRPO+SDPO)**: Shipped, examples documented.
+- **Cross-Family Architectural Review**: Shipped (`docs/reviews/cross-family-adr-008-009-010-2026-05-29/`).
+- **Alignment / V&V Closure**: ADR-011 (SDPO alignment indices), ADR-012 (close review findings), ADR-013 (LMA integration channel-ladder) shipped.
+- **Test Suites**: 210 passed / 16 skipped.
+- **Real Examples**: `examples/gsm8k_grpo/`, `examples/sdpo_with_real_traces_production/`.
 ## Deferred (post-loop, GPU-gated)

composer_replication/integrations/__init__.py ADDED Viewed

File without changes

composer_replication/integrations/altered_minds/__init__.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""altered_minds — framework-side, generic LMA integration glue (ADR-013).
+This package is the *model-agnostic* scaffold that lets the Composer Replication
+Framework drive the sister project llm-mental-alterations (LMA): take a
+personality-altered SFT checkpoint and apply the framework's 3-channel RL to ask
+whether task-driven RL washes out, preserves, or AMPLIFIES the alteration's
+cognitive-distortion signature.
+Nothing here loads an LMA checkpoint, calls Modal, or spends budget — that is
+explicitly user-gated (ADR-013 "out of scope"). This package provides:
+  - ``MMLUFormatReward``     : structured-answer reward (final letter + format
+                               only; never rationale style). Plus
+                               ``randomize_options`` and a logged option
+                               distribution so an "always C" exploit is
+                               detectable.
+  - ``dual_kl_logger``       : logs KL(policy||altered_init) AND KL(policy||base)
+                               each step — the washout/amplification instrument.
+  - ``channel_ladder_configs``: the A0-A4 isolated-channel ladder that REPLACES
+                               the old combined alpha=0.2/beta=0.4 recipe.
+See docs/adrs/ADR-013-lma-integration-channel-ladder.md.
+"""
+from __future__ import annotations
+from composer_replication.integrations.altered_minds.kl_logging import (
+    dual_kl_logger,
+    token_mean_kl,
+)
+from composer_replication.integrations.altered_minds.ladder import (
+    LADDER_KL_BETA,
+    channel_ladder_configs,
+)
+from composer_replication.integrations.altered_minds.reward import (
+    MMLUFormatReward,
+    parse_final_answer,
+    randomize_options,
+)
+__all__ = [
+    "MMLUFormatReward",
+    "parse_final_answer",
+    "randomize_options",
+    "dual_kl_logger",
+    "token_mean_kl",
+    "channel_ladder_configs",
+    "LADDER_KL_BETA",
+]

composer_replication/integrations/altered_minds/kl_logging.py ADDED Viewed

	@@ -0,0 +1,110 @@

+"""kl_logging.py — dual_kl_logger (ADR-013, framework-side, generic).
+The washout/amplification instrument. Given per-token logprobs from three
+forward passes on the SAME answer+reasoning tokens:
+  - policy:         the model currently being RL-trained
+  - altered_init:   the altered SFT checkpoint the run STARTED from (the locus
+                    of the cognitive-distortion signature)
+  - unaltered_base: the original base model BEFORE personality SFT
+returns ``{'kl_to_altered_init': float, 'kl_to_base': float}``.
+NEITHER KL is optimized by default — both are diagnostics:
+  - ``kl_to_altered_init`` rising means the policy is moving AWAY from the
+    altered checkpoint (task-RL is *changing* the alteration).
+  - ``kl_to_base`` measures distance to the unaltered base. If
+    ``kl_to_base`` SHRINKS while ``kl_to_altered_init`` grows, the alteration
+    is WASHING OUT (the policy drifts back toward base). If ``kl_to_base``
+    GROWS faster than ``kl_to_altered_init``, the alteration is being AMPLIFIED
+    (the policy moves further from base than the altered init already was) —
+    the ADR-013 amplification hypothesis, most likely on the SDPO channel.
+Token-mean KL is used (mean over the masked answer+reasoning tokens), the
+standard diagnostic convention. The math is the discrete KL between the two
+softmax distributions implied by the logprob tensors:
+    KL(p || q) = sum_v p_v (log p_v - log q_v)
+where ``p`` is the policy's per-token distribution. This is unit-testable on
+toy tensors: KL(p || p) == 0, and KL grows monotonically as the policy moves.
+"""
+from __future__ import annotations
+from typing import Any
+import torch
+__all__ = ["dual_kl_logger", "token_mean_kl"]
+def _as_log_probs(logprobs: torch.Tensor) -> torch.Tensor:
+    """Normalize an input that may be raw logits OR already-log-probs to valid
+    log-probabilities along the last (vocab) dim.
+    We re-apply ``log_softmax`` defensively: it is idempotent on a genuine
+    log-prob tensor up to floating point (log_softmax of log-probs == log-probs
+    since they already sum-exp to 1), and converts raw logits correctly. This
+    makes the logger robust to either calling convention.
+    """
+    return torch.log_softmax(logprobs.to(torch.float64), dim=-1)
+def token_mean_kl(
+    policy_logprobs: torch.Tensor,
+    ref_logprobs: torch.Tensor,
+    mask: torch.Tensor | None = None,
+) -> float:
+    """Token-mean KL(policy || ref) over distributions on the last dim.
+    Args:
+        policy_logprobs: (..., V) logits or log-probs for the policy.
+        ref_logprobs:    (..., V) logits or log-probs for the reference.
+        mask: optional (...,) mask of tokens to include (1/True = include). If
+            None, all tokens count.
+    Returns:
+        scalar token-mean KL as a python float (>= 0 up to float error).
+    """
+    log_p = _as_log_probs(policy_logprobs)
+    log_q = _as_log_probs(ref_logprobs)
+    p = log_p.exp()
+    # per-token KL: sum over vocab of p * (log p - log q)
+    per_token = (p * (log_p - log_q)).sum(dim=-1)  # (...,)
+    if mask is not None:
+        m = mask.to(per_token.dtype)
+        denom = m.sum()
+        if float(denom) == 0.0:
+            return 0.0
+        return float((per_token * m).sum() / denom)
+    return float(per_token.mean())
+def dual_kl_logger(
+    policy_logprobs: torch.Tensor,
+    altered_init_logprobs: torch.Tensor,
+    unaltered_base_logprobs: torch.Tensor,
+    mask: torch.Tensor | None = None,
+    **_: Any,
+) -> dict[str, float]:
+    """Compute the two diagnostic KLs for a step.
+    Args:
+        policy_logprobs:        (..., V) policy logits/log-probs on the
+            answer+reasoning tokens.
+        altered_init_logprobs:  (..., V) for the altered SFT init.
+        unaltered_base_logprobs:(..., V) for the unaltered base.
+        mask: optional (...,) token mask (answer+reasoning tokens to score).
+    Returns:
+        ``{'kl_to_altered_init': float, 'kl_to_base': float}``.
+    """
+    return {
+        "kl_to_altered_init": token_mean_kl(
+            policy_logprobs, altered_init_logprobs, mask
+        ),
+        "kl_to_base": token_mean_kl(
+            policy_logprobs, unaltered_base_logprobs, mask
+        ),
+    }

composer_replication/integrations/altered_minds/ladder.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""ladder.py — channel_ladder_configs (ADR-013, the experiment design).
+The isolated-channel ladder REPLACES the old combined alpha=0.2/beta=0.4 recipe
+(superseded; see docs/ALTERED_MINDS_TIE_IN.md Phase 3). Per ADR-013, a combined
+run confounds four effects (task RL, self-distillation of altered reasoning,
+frontier-teacher imitation, KL anchoring) and is scientifically uninterpretable.
+Worse, SDPO against the altered model's OWN hint-conditioned forward pass can
+AMPLIFY the distortion, so it is an experimental intervention, not a stabilizer.
+The ladder isolates channels so each effect is attributable:
+  | Arm | alpha_sdpo | beta_replay | Purpose                          |
+  |-----|------------|-------------|----------------------------------|
+  | A0  | —          | —           | altered SFT, no RL (control)     |
+  | A1  | 0.0        | 0.0         | GRPO-only baseline               |
+  | A2  | 0.02       | 0.0         | +SDPO small (amplification probe)|
+  | A3  | 0.0        | 0.05        | +replay-DPO small (washout probe)|
+  | A4  | 0.02       | 0.05        | combined — ONLY after A1-A3      |
+``kl_beta`` (KL-to-altered-init coef) = 0.02 for all RL arms. A0 is a sentinel
+(no RL) so its alpha/beta/kl_beta are None.
+"""
+from __future__ import annotations
+from typing import Any
+__all__ = ["channel_ladder_configs", "LADDER_KL_BETA"]
+#: KL-to-altered-init coefficient applied to every RL arm (A1-A4).
+LADDER_KL_BETA = 0.02
+def channel_ladder_configs() -> list[dict[str, Any]]:
+    """Return the ordered A0-A4 arm configs.
+    Each arm is a dict with keys: ``arm``, ``alpha_sdpo``, ``beta_replay``,
+    ``kl_beta``, ``note``. A0 is the no-RL sentinel (alpha/beta/kl_beta = None).
+    A runner sweeps these with IDENTICAL seeds/prompts so any observed change in
+    the alteration signature is attributable to the single channel that arm
+    turns on relative to A1.
+    """
+    return [
+        {
+            "arm": "A0",
+            "alpha_sdpo": None,
+            "beta_replay": None,
+            "kl_beta": None,
+            "note": (
+                "Control: altered SFT checkpoint, NO RL. Sentinel arm used to "
+                "anchor the pre-RL alteration signature."
+            ),
+        },
+        {
+            "arm": "A1",
+            "alpha_sdpo": 0.0,
+            "beta_replay": 0.0,
+            "kl_beta": LADDER_KL_BETA,
+            "note": (
+                "GRPO-only baseline (both extra channels OFF). Isolates the "
+                "effect of task-driven RL alone on the alteration."
+            ),
+        },
+        {
+            "arm": "A2",
+            "alpha_sdpo": 0.02,
+            "beta_replay": 0.0,
+            "kl_beta": LADDER_KL_BETA,
+            "note": (
+                "+SDPO small (amplification probe). SDPO ONLY vs A1: tests "
+                "whether self-distillation against the altered model's own "
+                "hint-conditioned forward pass AMPLIFIES the distortion."
+            ),
+        },
+        {
+            "arm": "A3",
+            "alpha_sdpo": 0.0,
+            "beta_replay": 0.05,
+            "kl_beta": LADDER_KL_BETA,
+            "note": (
+                "+replay-DPO small (washout probe). Trace-replay-DPO ONLY vs "
+                "A1: tests whether frontier-teacher disagreement WASHES OUT the "
+                "alteration toward base."
+            ),
+        },
+        {
+            "arm": "A4",
+            "alpha_sdpo": 0.02,
+            "beta_replay": 0.05,
+            "kl_beta": LADDER_KL_BETA,
+            "note": (
+                "Combined — run ONLY after A1-A3 are interpretable. Confounds "
+                "channels by design; meaningful only as a capstone once the "
+                "isolated arms are understood."
+            ),
+        },
+    ]

composer_replication/integrations/altered_minds/reward.py ADDED Viewed

	@@ -0,0 +1,231 @@

+"""reward.py — MMLUFormatReward (ADR-013, framework-side, generic).
+A structured-answer reward for RL on MMLU-style multiple-choice tasks. It scores
+ONLY the final answer letter + format validity — never the rationale's style or
+content. This is deliberate: the north-star use case (ADR-013) drives RL on a
+*personality-altered* model, and rewarding "persuasive" rationale would reward
+the very cognitive-distortion signature we are trying to measure rather than
+distort the reward toward it.
+Scoring per completion:
+  +1.0   final answer parses and equals the gold letter
+   0.0   final answer parses but is wrong
+  -0.2   no parseable final-answer marker (unparseable)
+  -0.1   multiple DISTINCT final-answer markers present (format hacking)
+  -len_penalty   small penalty past a rationale character cap
+Parsing accepts (case-insensitive, last match wins for the canonical letter):
+  - ``Answer: X``           (X in A-D)
+  - JSON ``{"answer": "X"}``
+Exploit detection: ``MMLUFormatReward`` keeps a running count of chosen letters
+(``option_distribution``) so an "always C" / option-prior exploit is detectable
+by inspecting that distribution after a run. A companion ``randomize_options``
+helper shuffles option order with an original->shuffled label remap so the
+training data itself can be de-biased.
+"""
+from __future__ import annotations
+import json
+import re
+from collections import Counter
+from dataclasses import dataclass, field
+from typing import Any
+__all__ = ["MMLUFormatReward", "randomize_options", "parse_final_answer"]
+_VALID_LETTERS = ("A", "B", "C", "D")
+# ``Answer: X`` — tolerant of whitespace, optional markdown bold/asterisks.
+_ANSWER_RE = re.compile(r"answer\s*[:\-]\s*\*{0,2}([A-D])\b", re.IGNORECASE)
+# JSON ``{"answer": "X"}`` — extract the value of an "answer" key.
+_JSON_ANSWER_RE = re.compile(
+    r'["\']answer["\']\s*:\s*["\']([A-D])["\']', re.IGNORECASE
+)
+def _find_markers(text: str) -> list[str]:
+    """Return ALL final-answer letters found (uppercased), in order of appearance.
+    Used both to pick the canonical answer (last match wins) and to detect the
+    multiple-distinct-markers format-hacking case.
+    """
+    markers: list[tuple[int, str]] = []
+    for m in _ANSWER_RE.finditer(text or ""):
+        markers.append((m.start(), m.group(1).upper()))
+    for m in _JSON_ANSWER_RE.finditer(text or ""):
+        markers.append((m.start(), m.group(1).upper()))
+    markers.sort(key=lambda p: p[0])
+    return [letter for _, letter in markers]
+def parse_final_answer(completion: str) -> tuple[str | None, int]:
+    """Parse the final answer letter from a completion.
+    Returns ``(letter_or_None, n_distinct_markers)``. ``letter`` is the LAST
+    marker found (last match wins). ``n_distinct_markers`` counts DISTINCT
+    letters across all markers (so two ``Answer: C`` are not penalized, but
+    ``Answer: A ... Answer: B`` is).
+    """
+    markers = _find_markers(completion)
+    if not markers:
+        return None, 0
+    distinct = len(set(markers))
+    return markers[-1], distinct
+@dataclass
+class MMLUFormatReward:
+    """Callable reward_fn(prompts, completions, *, answers, **kwargs) -> list[float].
+    Args:
+        rationale_char_cap: completions longer than this incur a small length
+            penalty (``length_penalty_per_char`` per char past the cap). Caps
+            verbosity without scoring rationale content.
+        length_penalty_per_char: per-character penalty past the cap.
+        correct_reward / wrong_reward / unparseable_reward /
+        multiple_answers_reward: the scalar rewards for each outcome.
+    Side effect: ``option_distribution`` (a Counter over chosen letters) and
+    ``n_scored`` accumulate across calls so an "always C" exploit is detectable
+    via ``exploit_report()``.
+    """
+    rationale_char_cap: int = 512
+    length_penalty_per_char: float = 0.001
+    correct_reward: float = 1.0
+    wrong_reward: float = 0.0
+    unparseable_reward: float = -0.2
+    multiple_answers_reward: float = -0.1
+    option_distribution: Counter = field(default_factory=Counter)
+    n_scored: int = 0
+    def __call__(
+        self,
+        prompts: Any = None,
+        completions: list[str] | None = None,
+        *,
+        answers: list[str] | None = None,
+        **kwargs: Any,
+    ) -> list[float]:
+        """Score a batch of completions against gold ``answers`` (letters A-D).
+        ``prompts`` is accepted for the TRL reward-fn signature but unused
+        (we score the completion text only). ``answers`` is required.
+        """
+        if completions is None:
+            completions = []
+        if answers is None:
+            raise ValueError(
+                "MMLUFormatReward requires `answers` (the gold letters, one per "
+                "completion). Pass via reward_fn(..., answers=[...])."
+            )
+        if len(answers) != len(completions):
+            raise ValueError(
+                f"answers/completions length mismatch: {len(answers)} vs "
+                f"{len(completions)}."
+            )
+        rewards: list[float] = []
+        for completion, gold in zip(completions, answers):
+            rewards.append(self._score_one(completion, gold))
+        return rewards
+    def _score_one(self, completion: str, gold: str) -> float:
+        letter, n_distinct = parse_final_answer(completion)
+        self.n_scored += 1
+        if letter is None:
+            # Unparseable: no usable final-answer marker. Length penalty does
+            # not apply (we never even parsed a letter to reward/penalize).
+            return self.unparseable_reward
+        # Log the chosen letter for exploit detection (always-C etc.).
+        self.option_distribution[letter] += 1
+        if n_distinct > 1:
+            # Multiple DISTINCT markers — format hacking. Penalize regardless
+            # of correctness (the model is hedging / gaming the parser).
+            base = self.multiple_answers_reward
+        elif gold is not None and letter == str(gold).strip().upper():
+            base = self.correct_reward
+        else:
+            base = self.wrong_reward
+        return base - self._length_penalty(completion)
+    def _length_penalty(self, completion: str) -> float:
+        over = max(0, len(completion or "") - self.rationale_char_cap)
+        return self.length_penalty_per_char * over
+    # ------------------------------------------------------------------
+    # Exploit detection
+    # ------------------------------------------------------------------
+    def exploit_report(self) -> dict[str, Any]:
+        """Summarize the chosen-letter distribution so an option-prior exploit
+        (e.g. "always C") is detectable.
+        Returns a dict with the raw counts, the most common letter, and its
+        fraction of all parsed answers. A healthy run is ~uniform over A-D; a
+        fraction near 1.0 for a single letter is the exploit signature.
+        """
+        total = sum(self.option_distribution.values())
+        if total == 0:
+            return {
+                "counts": {},
+                "total_parsed": 0,
+                "most_common": None,
+                "max_fraction": 0.0,
+            }
+        letter, count = self.option_distribution.most_common(1)[0]
+        return {
+            "counts": dict(self.option_distribution),
+            "total_parsed": total,
+            "most_common": letter,
+            "max_fraction": count / total,
+        }
+def randomize_options(
+    item: dict[str, Any], seed: int
+) -> tuple[dict[str, Any], dict[str, str]]:
+    """Shuffle the multiple-choice option order, tracking original->shuffled letters.
+    Args:
+        item: a dict with ``options`` (list[str], A-first ordering) and
+            ``answer`` (the gold letter, A-D). Other keys are passed through.
+        seed: deterministic RNG seed for the shuffle.
+    Returns:
+        ``(shuffled_item, label_remap)`` where ``shuffled_item`` has the options
+        reordered and its ``answer`` updated to the gold option's NEW letter, and
+        ``label_remap`` maps each ORIGINAL letter -> its NEW (shuffled) letter.
+    This de-biases an option-prior exploit at the data level: if the gold answer
+    is no longer correlated with a fixed position, "always C" stops working.
+    """
+    import random
+    options = list(item.get("options", []))
+    n = len(options)
+    if n == 0:
+        return dict(item), {}
+    orig_letters = [chr(ord("A") + i) for i in range(n)]
+    rng = random.Random(seed)
+    perm = list(range(n))
+    rng.shuffle(perm)
+    # perm[new_pos] = old_pos  =>  option at new_pos is the old option perm[new_pos]
+    shuffled_options = [options[perm[new]] for new in range(n)]
+    # original letter -> new letter: old index `perm[new]` moved to position `new`.
+    label_remap: dict[str, str] = {}
+    for new_pos, old_pos in enumerate(perm):
+        label_remap[orig_letters[old_pos]] = orig_letters[new_pos]
+    shuffled_item = dict(item)
+    shuffled_item["options"] = shuffled_options
+    gold = str(item.get("answer", "")).strip().upper()
+    if gold in label_remap:
+        shuffled_item["answer"] = label_remap[gold]
+    return shuffled_item, label_remap

composer_replication/integrations/altered_minds/tests/__init__.py ADDED Viewed

File without changes

composer_replication/integrations/altered_minds/tests/test_channel_ladder.py ADDED Viewed

	@@ -0,0 +1,362 @@

+"""Tests for the ADR-013 altered_minds integration glue + the B4 SDPO-fires proof.
+Covers the ADR-013 acceptance gate:
+  - MMLUFormatReward: correct→+1, wrong→0, unparseable→−0.2, multiple→−0.1,
+    length-penalty, and an "always C" option-prior exploit is DETECTABLE via the
+    logged option distribution. Rationale style is NOT scored.
+  - dual_kl_logger: KL(p‖p)==0 and KL grows as the policy moves.
+  - channel_ladder_configs: A1 both off, A2 SDPO-only, A3 replay-only.
+  - B4: the SDPO channel actually FIRES (NONZERO loss) with REAL collator-built
+    alignment indices. See the module docstring on test_b4_* for the honest
+    stub-vs-real note.
+All CPU-only and fast (stub tokenizer + tiny model — no model download).
+"""
+from __future__ import annotations
+import pytest
+import torch
+from composer_replication.integrations.altered_minds import (
+    MMLUFormatReward,
+    channel_ladder_configs,
+    dual_kl_logger,
+    randomize_options,
+)
+# ===========================================================================
+# MMLUFormatReward
+# ===========================================================================
+def test_reward_correct_wrong_unparseable_multiple():
+    r = MMLUFormatReward()
+    completions = [
+        "Reasoning blah. Answer: B",                 # correct
+        "I think it's Answer: A",                    # wrong (gold C)
+        "no marker here at all",                     # unparseable
+        "Answer: A then actually Answer: D",         # multiple distinct
+        '{"answer": "C"}',                           # JSON correct
+    ]
+    answers = ["B", "C", "B", "A", "C"]
+    out = r(prompts=None, completions=completions, answers=answers)
+    assert out[0] == pytest.approx(1.0)    # correct
+    assert out[1] == pytest.approx(0.0)    # wrong
+    assert out[2] == pytest.approx(-0.2)   # unparseable
+    assert out[3] == pytest.approx(-0.1)   # multiple distinct markers
+    assert out[4] == pytest.approx(1.0)    # JSON correct
+def test_reward_last_match_wins_same_letter_not_penalized():
+    """Two markers of the SAME letter is not 'multiple distinct' — last wins."""
+    r = MMLUFormatReward()
+    out = r(completions=["Answer: C ... so my final Answer: C"], answers=["C"])
+    assert out[0] == pytest.approx(1.0)
+def test_reward_case_insensitive_and_json_variants():
+    r = MMLUFormatReward()
+    out = r(
+        completions=["answer: d", '{"answer":"a"}'],
+        answers=["D", "A"],
+    )
+    assert out[0] == pytest.approx(1.0)
+    assert out[1] == pytest.approx(1.0)
+def test_reward_length_penalty_only_past_cap():
+    """A correct-but-long completion is penalized by ~0.001/char past the cap;
+    a short one is not. Rationale CONTENT is never scored — only length."""
+    r = MMLUFormatReward(rationale_char_cap=20, length_penalty_per_char=0.001)
+    short = "Answer: B"                       # under cap
+    long = "x" * 120 + " Answer: B"           # ~130 chars, 110 over cap
+    out = r(completions=[short, long], answers=["B", "B"])
+    assert out[0] == pytest.approx(1.0)
+    # 130 - 20 = 110 over => penalty 0.110; reward 1.0 - 0.110
+    assert out[1] < 1.0
+    assert out[1] == pytest.approx(1.0 - 0.001 * (len(long) - 20))
+def test_reward_always_C_exploit_is_detectable():
+    """An 'always C' policy that happens to be right when gold==C scores well on
+    those items, but the logged option distribution reveals the exploit."""
+    r = MMLUFormatReward()
+    completions = [f"Answer: C" for _ in range(10)]
+    golds = ["C", "A", "B", "C", "D", "A", "C", "B", "C", "D"]
+    r(completions=completions, answers=golds)
+    report = r.exploit_report()
+    assert report["most_common"] == "C"
+    # Every parsed answer was C => fraction 1.0 — the exploit signature.
+    assert report["max_fraction"] == pytest.approx(1.0)
+    assert report["counts"] == {"C": 10}
+def test_reward_requires_answers():
+    r = MMLUFormatReward()
+    with pytest.raises(ValueError, match="requires `answers`"):
+        r(completions=["Answer: A"])
+def test_randomize_options_tracks_label_remap_and_updates_gold():
+    item = {"question": "q", "options": ["w", "x", "y", "z"], "answer": "A"}
+    shuffled, remap = randomize_options(item, seed=7)
+    # All four letters map to four distinct new letters (a permutation).
+    assert sorted(remap.keys()) == ["A", "B", "C", "D"]
+    assert sorted(remap.values()) == ["A", "B", "C", "D"]
+    # The gold option's text ("w", originally A) now lives at its remapped letter.
+    new_gold_letter = shuffled["answer"]
+    new_gold_idx = ord(new_gold_letter) - ord("A")
+    assert shuffled["options"][new_gold_idx] == "w"
+    assert remap["A"] == new_gold_letter
+    # Determinism.
+    shuffled2, remap2 = randomize_options(item, seed=7)
+    assert remap == remap2 and shuffled["options"] == shuffled2["options"]
+# ===========================================================================
+# dual_kl_logger
+# ===========================================================================
+def test_dual_kl_self_is_zero():
+    """KL(p‖p) == 0 for both diagnostics."""
+    logits = torch.randn(2, 5, 16)
+    out = dual_kl_logger(logits, logits, logits)
+    assert out["kl_to_altered_init"] == pytest.approx(0.0, abs=1e-6)
+    assert out["kl_to_base"] == pytest.approx(0.0, abs=1e-6)
+def test_dual_kl_grows_as_policy_moves():
+    """As the policy distribution moves further from a fixed reference, the KL
+    grows monotonically. Both diagnostics are non-negative."""
+    torch.manual_seed(0)
+    ref = torch.randn(1, 4, 16)
+    base = torch.randn(1, 4, 16)
+    near = ref + 0.1 * torch.randn_like(ref)
+    far = ref + 2.0 * torch.randn_like(ref)
+    kl_near = dual_kl_logger(near, ref, base)["kl_to_altered_init"]
+    kl_far = dual_kl_logger(far, ref, base)["kl_to_altered_init"]
+    assert kl_near >= -1e-9
+    assert kl_far > kl_near, f"KL should grow as policy moves: {kl_near} -> {kl_far}"
+def test_dual_kl_mask_restricts_tokens():
+    """A token mask restricts the mean to the masked answer+reasoning tokens."""
+    torch.manual_seed(1)
+    policy = torch.randn(1, 4, 8)
+    ref = torch.randn(1, 4, 8)
+    base = torch.randn(1, 4, 8)
+    mask = torch.tensor([[1, 1, 0, 0]])
+    out = dual_kl_logger(policy, ref, base, mask=mask)
+    # Masked-all-zero => 0.0 (guarded), nonzero mask => finite non-negative.
+    assert out["kl_to_altered_init"] >= -1e-9
+    zero = dual_kl_logger(policy, ref, base, mask=torch.zeros(1, 4))
+    assert zero["kl_to_altered_init"] == 0.0
+    assert zero["kl_to_base"] == 0.0
+# ===========================================================================
+# channel_ladder_configs
+# ===========================================================================
+def test_ladder_arms_and_order():
+    arms = channel_ladder_configs()
+    assert [a["arm"] for a in arms] == ["A0", "A1", "A2", "A3", "A4"]
+def test_ladder_a0_is_no_rl_sentinel():
+    a0 = channel_ladder_configs()[0]
+    assert a0["arm"] == "A0"
+    assert a0["alpha_sdpo"] is None
+    assert a0["beta_replay"] is None
+    assert a0["kl_beta"] is None
+def test_ladder_a1_both_off():
+    a1 = channel_ladder_configs()[1]
+    assert a1["alpha_sdpo"] == 0.0
+    assert a1["beta_replay"] == 0.0
+    assert a1["kl_beta"] == 0.02
+def test_ladder_a2_sdpo_only():
+    a2 = channel_ladder_configs()[2]
+    assert a2["alpha_sdpo"] == 0.02
+    assert a2["beta_replay"] == 0.0
+    assert a2["kl_beta"] == 0.02
+def test_ladder_a3_replay_only():
+    a3 = channel_ladder_configs()[3]
+    assert a3["alpha_sdpo"] == 0.0
+    assert a3["beta_replay"] == 0.05
+    assert a3["kl_beta"] == 0.02
+def test_ladder_a4_combined():
+    a4 = channel_ladder_configs()[4]
+    assert a4["alpha_sdpo"] == 0.02
+    assert a4["beta_replay"] == 0.05
+# ===========================================================================
+# B4 — the SDPO channel actually FIRES (NONZERO) with REAL collator indices
+# ===========================================================================
+#
+# HONEST NOTE ON STUB-VS-REAL (ADR-013 B4 acceptance):
+#
+# This proof uses the same TinyLM stub pattern as
+# trainer/tests/test_sdpo_alignment_indices.py, NOT a real Qwen checkpoint
+# (kept offline/CPU and deterministic). The alignment indices are REAL: they are
+# built by the production ComposerDataCollator from a trace that HAS an error
+# turn (so ctx_teacher_input_ids + student/teacher_response_idx are genuinely
+# emitted by the shipped collator, exactly as in a real run).
+#
+# Why we must perturb the student tokens to get a NONZERO loss: the collator's
+# placeholder-alignment trick makes student and teacher carry the SAME token ids
+# at the SAME absolute positions at valid aligned indices, so a deterministic
+# stub yields JSD≈0 there (the CORRECT answer for a perfectly-aligned identical
+# model — see that test's gate-3 note). To prove the channel genuinely GATHERS
+# the aligned positions and computes nonzero divergence, we make the student's
+# input_ids DIFFER from the teacher's at exactly the aligned response positions
+# — this mimics the hint actually changing the recovery tokens (the real-world
+# case where SDPO has a signal to distill). With a position-dependent stub,
+# different aligned token ids => different logits => provably NONZERO JSD on a
+# grad path, through the real collator-built indices.
+from composer_replication.trainer.data_collator import (  # noqa: E402
+    CollatorConfig,
+    ComposerDataCollator,
+)
+class _StubTok:
+    """Word-level deterministic tokenizer; apply_chat_template space-joins."""
+    pad_token_id = 0
+    def __init__(self) -> None:
+        self._v: dict[str, int] = {"<pad>": 0, "<bos>": 1, "<eos>": 2}
+    def _id(self, w: str) -> int:
+        if w not in self._v:
+            self._v[w] = len(self._v)
+        return self._v[w]
+    def __call__(self, text, **_k):
+        return {"input_ids": [self._id(w) for w in text.split()] if text else []}
+    def apply_chat_template(self, messages, tokenize=True, **_k):  # noqa: ARG002
+        return [self._id(w) for w in " ".join(m.get("content", "") for m in messages).split()]
+class _TinyLM(torch.nn.Module):
+    """Position-dependent minimal model: model(input_ids=...).logits."""
+    def __init__(self, vocab: int = 64, hidden: int = 8, max_pos: int = 512):
+        super().__init__()
+        torch.manual_seed(0)
+        self.embed = torch.nn.Embedding(vocab, hidden)
+        self.pos = torch.nn.Embedding(max_pos, hidden)
+        self.head = torch.nn.Linear(hidden, vocab)
+    def forward(self, input_ids: torch.Tensor):
+        T = input_ids.size(1)
+        positions = torch.arange(T, device=input_ids.device).unsqueeze(0)
+        h = self.embed(input_ids) + self.pos(positions)
+        class _Out:
+            pass
+        out = _Out()
+        out.logits = self.head(h)
+        return out
+def _hint_gen(_kind, _meta):
+    return "HINT search before reading"
+def _error_trace(trace_id: str, recovery: str = "let me use a real tool instead now"):
+    return {
+        "trace_id": trace_id,
+        "turns": [
+            {"role": "user", "content": "do the task now"},
+            {"role": "user", "content": "tool not found error occurred"},
+            {
+                "role": "assistant",
+                "content": recovery,
+                "tool_error": "tool_not_found",
+                "error_meta": {},
+            },
+        ],
+        "final_reward": 0.0,
+    }
+def _make_sdpo_trainer(alpha_sdpo: float):
+    from composer_replication.trainer.composer_trainer import ComposerReplicationTrainer
+    obj = ComposerReplicationTrainer.__new__(ComposerReplicationTrainer)
+    obj.alpha_sdpo = alpha_sdpo
+    obj.sdpo_jsd_beta = 0.5
+    obj.sdpo_temperature = 1.0
+    obj.sdpo_token_clip = None
+    obj.strict_sdpo_alignment = True  # production default
+    return obj
+def test_b4_sdpo_fires_nonzero_with_real_collator_indices():
+    """B4: with REAL collator-built alignment indices and the student tokens
+    differing from the teacher at the aligned response positions (hint changed
+    the recovery tokens), the SDPO channel gathers those positions and produces
+    a NONZERO JSD on a grad path — proving the channel actually FIRES."""
+    tok = _StubTok()
+    cfg = CollatorConfig(hint_generator=_hint_gen, enable_replay_dpo=False)
+    collator = ComposerDataCollator(tokenizer=tok, config=cfg)
+    batch = collator([_error_trace("b4-fires")])
+    # Sanity: the collator genuinely emitted error-site teacher context + indices.
+    assert batch["ctx_teacher_input_ids"].numel() > 0
+    s_idx = batch["student_response_idx"]
+    t_idx = batch["teacher_response_idx"]
+    s_valid = batch["student_response_valid"]
+    assert int(s_valid.sum()) > 0, "no valid aligned positions — collator emitted nothing"
+    # Perturb the STUDENT tokens at the aligned response positions so they differ
+    # from the teacher's tokens there (the hint changed the recovery tokens). We
+    # keep the REAL collator-built indices; only the student input_ids change.
+    student_ids = batch["input_ids"].clone()
+    vocab_ceiling = int(
+        max(batch["input_ids"].max(), batch["ctx_teacher_input_ids"].max())
+    ) + 8
+    for b in range(s_idx.shape[0]):
+        for k in range(s_idx.shape[1]):
+            if bool(s_valid[b, k]):
+                pos = int(s_idx[b, k])
+                # bump to a different, in-vocab token id (deterministic).
+                student_ids[b, pos] = (int(student_ids[b, pos]) + 3) % vocab_ceiling
+    batch["input_ids"] = student_ids
+    model = _TinyLM(vocab=max(vocab_ceiling, 8))
+    obj = _make_sdpo_trainer(alpha_sdpo=0.02)  # A2 config (SDPO-only small)
+    loss = obj._compute_sdpo_loss(model, batch)
+    val = float(loss.detach())
+    assert val == val and val not in (float("inf"), float("-inf")), "loss not finite"
+    assert loss.requires_grad, "SDPO loss must be on a grad path"
+    assert val > 1e-6, (
+        f"SDPO channel did not fire: JSD={val} (expected NONZERO once the "
+        "aligned student/teacher tokens differ). The channel must gather the "
+        "real collator indices and compute a positive divergence."
+    )
+    # Prove it is differentiable end-to-end: backward populates a real gradient.
+    (obj.alpha_sdpo * loss).backward()
+    grad_norm = sum(
+        float(p.grad.norm()) for p in model.parameters() if p.grad is not None
+    )
+    assert grad_norm > 0.0, "no gradient flowed from the SDPO loss into the model"

docs/ALTERED_MINDS_TIE_IN.md CHANGED Viewed

@@ -79,16 +79,43 @@ Fits inside the user's existing $400 altered-minds budget.
 ### Phase 3 — GRPO with the framework
-Run `composer_replication.recipes.trl.ComposerReplicationTrainer` with:
-- **Channel 1 (GRPO)**: turned ON, reward = MMLU letter-correctness
-- **Channel 2 (SDPO/OPSD)**: turned ON at α=0.2, hint-conditioned
-  against the altered model's own forward pass
-- **Channel 3 (trace-replay DPO)**: turned ON at β=0.4, against the
-  Phase-2 datasets
-Train for ~500 steps on a single GPU (Qwen-0.5B feasibility-test
-already confirmed in the framework; for Llama-8B, use Modal + the
-framework's `ServerlessExecutor` per ADR-005 — local 5090 is too small).
 ### Phase 4 — re-evaluate

 ### Phase 3 — GRPO with the framework
+> **⚠️ SUPERSEDED by [ADR-013](adrs/ADR-013-lma-integration-channel-ladder.md).**
+> The original all-channels-on combined recipe (α=0.2, β=0.4) is **not used**.
+> A cross-family research critique (2026-05-29) found a combined-first run
+> **scientifically uninterpretable**: it confounds four effects (task RL,
+> self-distillation of altered reasoning, frontier-teacher imitation, KL
+> anchoring), so any observed change in the alteration signature cannot be
+> attributed to a channel. Worse, **SDPO against the altered model's own
+> hint-conditioned forward pass is the channel most likely to AMPLIFY the
+> distortion** (teacher == student-family; if hints add no independent
+> information, the optimum is to imitate the altered conditional distribution,
+> sharpening a soft bias into a hard preference). SDPO here is therefore an
+> *experimental intervention*, not a benign stabilizer.
+**Use the isolated-channel ladder (ADR-013) instead** — sweep arms A0–A4 with
+identical seeds/prompts so each channel's effect is attributable:
+| Arm | alpha_sdpo | beta_replay | Purpose |
+|---|---|---|---|
+| A0 | — | — | altered SFT, no RL (control) |
+| A1 | 0.0 | 0.0 | GRPO-only baseline |
+| A2 | **0.02** | 0.0 | +SDPO small (amplification probe) |
+| A3 | 0.0 | **0.05** | +replay-DPO small (washout probe) |
+| A4 | 0.02 | 0.05 | combined — only after A1–A3 interpretable |
+`kl_beta=0.02` (KL-to-altered-init) on every RL arm, adaptive to 0.01–0.03
+nats/token; hard-stop/LR-cut if KL > ~0.08. The framework provides the ladder
+via `composer_replication.integrations.altered_minds.channel_ladder_configs()`,
+the structured `MMLUFormatReward` (scores the final answer letter + format
+only — never rationale style, so distorted-but-persuasive reasoning is not
+rewarded), and `dual_kl_logger` (logs KL-to-altered-init **and** KL-to-base each
+step — the washout-vs-amplification instrument).
+Train for ~500 steps per arm on a single GPU (Qwen-0.5B feasibility-test
+already confirmed; for Llama-8B, use Modal + the framework's `ServerlessExecutor`
+per ADR-005 — local 5090 is too small). The real 8B/LMA-checkpoint run remains
+**user-gated** (it spends grant budget) — ADR-013 ships the capability, proven
+CPU-only on a small model (`examples/altered_minds_channel_ladder/`).
 ### Phase 4 — re-evaluate

docs/VISION_VALIDATION.md CHANGED Viewed

@@ -1,5 +1,12 @@
 # Vision Validation: Does the Framework Encapsulate the Original Brief?
 > **Status:** Self-audit, 2026-05-25 (Wave 6).
 > **Question:** Does what we've built reflect what was originally asked for, or did we drift?
 > **Method:** Recover original brief verbatim → atomic-clause decomposition → traceability matrix → adversarial self-review → user-journey simulation → concrete pass/fail scorecard with gap-closing actions.

 # Vision Validation: Does the Framework Encapsulate the Original Brief?
+> **## Status as of 2026-05-29**
+> The framework is past-skeleton: 8 subpackages (`composer_replication/*`), 210 passing tests, and operational end-to-end examples (`gsm8k_grpo`, `sdpo_with_real_traces_production`). The 3-channel loss, layered hint-generation, trace-ingestion, and DiLoCo have all shipped and been cross-family reviewed.
+>
+> **Two remaining honest gaps:**
+> 1. Docker/TorchForge substrate E2E is hardware-blocked (lacking local multi-GPU rig for the orchestrator layer).
+> 2. Real LMA full-scale run (8B model, 10k SWE-bench traces) is user-budget-gated.
 > **Status:** Self-audit, 2026-05-25 (Wave 6).
 > **Question:** Does what we've built reflect what was originally asked for, or did we drift?
 > **Method:** Recover original brief verbatim → atomic-clause decomposition → traceability matrix → adversarial self-review → user-journey simulation → concrete pass/fail scorecard with gap-closing actions.

examples/altered_minds_channel_ladder/README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+# altered_minds_channel_ladder — B4 SDPO-fires proof (ADR-013)
+CPU end-to-end proof that the **SDPO channel actually FIRES (nonzero)** on a
+batch built by the production `ComposerDataCollator` with **real, collator-built
+alignment indices**, in the **A2** isolated-channel-ladder config
+(`alpha_sdpo=0.02`).
+## Why this exists
+`examples/composer_grpo_sdpo_smoke` proves the SDPO channel is *wired* into a
+live TRL Dr.GRPO loop, but its toy synthetic rollouts carry no error sites, so
+`_compute_sdpo_loss` returns `0` — the channel never actually fires. This script
+closes that gap: it feeds a trace that **has an error turn**, so the collator
+emits `ctx_teacher_input_ids` + `student_response_idx`/`teacher_response_idx`,
+and the SDPO JSD is proven **nonzero** on a differentiable grad path.
+## Proof achieved: `TinyLM-stub-with-differing-tokens`
+- **Alignment indices: REAL** — emitted by the shipped `ComposerDataCollator`
+  from a genuine error-turn trace, exactly as in a real run.
+- **Model: a deterministic position-dependent `TinyLM` stub** (CPU, no
+  download), the same pattern as
+  `composer_replication/trainer/tests/test_sdpo_alignment_indices.py`.
+- **Why student tokens are perturbed:** the collator's placeholder-alignment
+  trick makes student and teacher carry identical tokens at identical positions
+  at the valid aligned indices, so a deterministic stub yields `JSD≈0` there
+  (the *correct* answer for a perfectly-aligned identical model). To prove the
+  channel genuinely **gathers** the aligned positions and computes a real
+  divergence, the student's `input_ids` are made to **differ** from the
+  teacher's at exactly those aligned positions — mimicking the hint actually
+  changing the recovery tokens (the real-world case where SDPO has signal to
+  distill). Different aligned tokens ⇒ different logits ⇒ provably **NONZERO**
+  JSD.
+This is the honest, deterministic CPU proof. Loading a real Qwen2.5-0.5B
+checkpoint is **not required** for the B4 gate and is **not** the same as loading
+an LMA checkpoint (still user-gated, ADR-013 out-of-scope).
+## Run
+```bash
+cd <repo> && .venv/bin/python examples/altered_minds_channel_ladder/run.py
+```
+Optional: `ALTERED_MINDS_REAL_MODEL=1` swaps the stub for a cached
+Qwen2.5-0.5B-Instruct (offline, much slower on CPU). The same token-perturbation
+is still required for a nonzero signal.
+Exit `0` = PASS (SDPO fired nonzero), `1` = FAIL, `2` = SKIP (deps unavailable).
+The automated assertion lives in
+`composer_replication/integrations/altered_minds/tests/test_channel_ladder.py::test_b4_sdpo_fires_nonzero_with_real_collator_indices`.

examples/altered_minds_channel_ladder/run.py ADDED Viewed

	@@ -0,0 +1,229 @@

+"""B4 end-to-end CPU proof: the SDPO channel actually FIRES (NONZERO) on a real
+collator-built batch with genuine alignment indices (ADR-013).
+The existing examples/composer_grpo_sdpo_smoke proves the SDPO channel is *wired*
+into a live TRL Dr.GRPO loop, but its toy synthetic rollouts carry no error
+sites, so _compute_sdpo_loss returns 0 (the channel never actually fires). This
+script closes that gap: it builds a REAL ComposerDataCollator batch from a trace
+that HAS an error turn — so ctx_teacher_input_ids + student/teacher_response_idx
+are emitted by the shipped collator — and proves the SDPO JSD is NONZERO over
+>=1 step, in the A2 ladder config (alpha_sdpo=0.02).
+PROOF ACHIEVED: stub-with-differing-tokens (NOT a real Qwen checkpoint).
+  - Alignment indices: REAL (production ComposerDataCollator, real error turn).
+  - Model: a deterministic position-dependent TinyLM stub (CPU, no download),
+    the same pattern used by trainer/tests/test_sdpo_alignment_indices.py.
+  - Why perturb student tokens: the collator's placeholder-alignment trick makes
+    student & teacher carry identical tokens at identical positions at the valid
+    aligned indices, so a deterministic stub yields JSD≈0 there (correct for a
+    perfectly-aligned identical model). To prove the channel GATHERS the aligned
+    positions and computes a real divergence, the student's input_ids are made to
+    DIFFER from the teacher's at exactly those aligned positions — mimicking the
+    hint actually changing the recovery tokens (the real-world case where SDPO
+    has signal to distill). Different aligned tokens => different logits =>
+    provably NONZERO JSD, on a differentiable grad path.
+To run the SAME assertion against a real Qwen2.5-0.5B-Instruct (if cached
+offline), set ALTERED_MINDS_REAL_MODEL=1 — note that even with a real model the
+NONZERO signal still requires the aligned student/teacher tokens to differ, so
+this script keeps the same token-perturbation; the real-model path only swaps
+the stub for the HF model and is much slower on CPU.
+Exit 0 = PASS (SDPO fired nonzero), 1 = FAIL, 2 = SKIP (deps unavailable).
+"""
+from __future__ import annotations
+import os
+import sys
+def _build_tiny_lm(vocab: int):
+    import torch
+    class _TinyLM(torch.nn.Module):
+        def __init__(self, vocab: int = 64, hidden: int = 8, max_pos: int = 512):
+            super().__init__()
+            torch.manual_seed(0)
+            self.embed = torch.nn.Embedding(vocab, hidden)
+            self.pos = torch.nn.Embedding(max_pos, hidden)
+            self.head = torch.nn.Linear(hidden, vocab)
+        def forward(self, input_ids):
+            T = input_ids.size(1)
+            positions = torch.arange(T, device=input_ids.device).unsqueeze(0)
+            h = self.embed(input_ids) + self.pos(positions)
+            class _Out:
+                pass
+            out = _Out()
+            out.logits = self.head(h)
+            return out
+    return _TinyLM(vocab=max(vocab, 8))
+class _StubTok:
+    pad_token_id = 0
+    def __init__(self) -> None:
+        self._v = {"<pad>": 0, "<bos>": 1, "<eos>": 2}
+    def _id(self, w: str) -> int:
+        if w not in self._v:
+            self._v[w] = len(self._v)
+        return self._v[w]
+    def __call__(self, text, **_k):
+        return {"input_ids": [self._id(w) for w in text.split()] if text else []}
+    def apply_chat_template(self, messages, tokenize=True, **_k):  # noqa: ARG002
+        return [
+            self._id(w)
+            for w in " ".join(m.get("content", "") for m in messages).split()
+        ]
+def _hint_gen(_kind, _meta):
+    return "HINT search before reading"
+def _error_trace():
+    return {
+        "trace_id": "b4-channel-ladder",
+        "turns": [
+            {"role": "user", "content": "do the task now"},
+            {"role": "user", "content": "tool not found error occurred"},
+            {
+                "role": "assistant",
+                "content": "let me use a real working tool instead now",
+                "tool_error": "tool_not_found",
+                "error_meta": {},
+            },
+        ],
+        "final_reward": 0.0,
+    }
+def main() -> int:
+    os.environ.setdefault("HF_HUB_OFFLINE", "1")
+    os.environ.setdefault("TRANSFORMERS_OFFLINE", "1")
+    try:
+        import torch  # noqa: F401
+        from composer_replication.integrations.altered_minds import (
+            channel_ladder_configs,
+        )
+        from composer_replication.trainer.composer_trainer import (
+            ComposerReplicationTrainer,
+            make_dr_grpo_config,
+        )
+        from composer_replication.trainer.data_collator import (
+            CollatorConfig,
+            ComposerDataCollator,
+        )
+    except Exception as e:  # noqa: BLE001
+        print(f"SKIP: import failed: {e!r}")
+        return 2
+    # A2 arm = +SDPO small (alpha_sdpo=0.02), the amplification probe.
+    a2 = next(a for a in channel_ladder_configs() if a["arm"] == "A2")
+    print(f"[b4] ladder arm A2: alpha_sdpo={a2['alpha_sdpo']} "
+          f"beta_replay={a2['beta_replay']} kl_beta={a2['kl_beta']}")
+    # make_dr_grpo_config is exercised to prove the config wiring is intact
+    # (the actual TLM stub forward does not need a GRPOConfig, but a real A2
+    # runner would pass this through to ComposerReplicationTrainer).
+    try:
+        cfg = make_dr_grpo_config(output_dir="/tmp/b4_ladder_out", report_to=[])
+        print(f"[b4] Dr.GRPO config OK: loss_type={cfg.loss_type} "
+              f"scale_rewards={cfg.scale_rewards} num_iterations={cfg.num_iterations}")
+    except Exception as e:  # noqa: BLE001
+        print(f"[b4] (config build skipped: {e!r})")
+    # --- REAL collator-built batch with a genuine error turn ---
+    tok = _StubTok()
+    collator = ComposerDataCollator(
+        tokenizer=tok,
+        config=CollatorConfig(hint_generator=_hint_gen, enable_replay_dpo=False),
+    )
+    batch = collator([_error_trace()])
+    if batch.get("ctx_teacher_input_ids") is None or batch["ctx_teacher_input_ids"].numel() == 0:
+        print("FAIL: collator emitted no error-site teacher context.")
+        return 1
+    s_idx = batch["student_response_idx"]
+    s_valid = batch["student_response_valid"]
+    if int(s_valid.sum()) == 0:
+        print("FAIL: no valid aligned response positions.")
+        return 1
+    print(f"[b4] collator emitted real alignment indices: "
+          f"student_response_idx shape={tuple(s_idx.shape)}, "
+          f"valid positions={int(s_valid.sum())}")
+    # --- Make the student tokens differ from teacher at aligned positions ---
+    student_ids = batch["input_ids"].clone()
+    vocab_ceiling = int(
+        max(batch["input_ids"].max(), batch["ctx_teacher_input_ids"].max())
+    ) + 8
+    for b in range(s_idx.shape[0]):
+        for k in range(s_idx.shape[1]):
+            if bool(s_valid[b, k]):
+                pos = int(s_idx[b, k])
+                student_ids[b, pos] = (int(student_ids[b, pos]) + 3) % vocab_ceiling
+    batch["input_ids"] = student_ids
+    real_model = os.environ.get("ALTERED_MINDS_REAL_MODEL") == "1"
+    if real_model:
+        try:
+            from transformers import AutoModelForCausalLM
+            model_id = os.environ.get("SMOKE_MODEL", "Qwen/Qwen2.5-0.5B-Instruct")
+            print(f"[b4] loading real model {model_id} (CPU, slow) ...")
+            model = AutoModelForCausalLM.from_pretrained(model_id)
+            print("[b4] real model loaded; proof path = REAL-MODEL")
+        except Exception as e:  # noqa: BLE001
+            print(f"[b4] real model unavailable ({e!r}); falling back to TinyLM stub")
+            model = _build_tiny_lm(vocab_ceiling)
+            real_model = False
+    else:
+        model = _build_tiny_lm(vocab_ceiling)
+    # --- A2 config: SDPO-only small (alpha_sdpo=0.02), strict alignment ---
+    obj = ComposerReplicationTrainer.__new__(ComposerReplicationTrainer)
+    obj.alpha_sdpo = float(a2["alpha_sdpo"])
+    obj.sdpo_jsd_beta = 0.5
+    obj.sdpo_temperature = 1.0
+    obj.sdpo_token_clip = None
+    obj.strict_sdpo_alignment = True
+    loss = obj._compute_sdpo_loss(model, batch)
+    val = float(loss.detach())
+    print("=" * 64)
+    print(f"  proof path:            {'REAL-MODEL' if real_model else 'TinyLM-stub-with-differing-tokens'}")
+    print(f"  SDPO JSD (sdpo_kl):    {val:.6f}")
+    print(f"  requires_grad:         {loss.requires_grad}")
+    if not (val == val) or val in (float("inf"), float("-inf")):
+        print("  RESULT: FAIL ❌ (loss not finite)")
+        return 1
+    if val <= 1e-6:
+        print("  RESULT: FAIL ❌ (SDPO channel did not fire — JSD ~0)")
+        return 1
+    (obj.alpha_sdpo * loss).backward()
+    grad_norm = sum(
+        float(p.grad.norm()) for p in model.parameters() if p.grad is not None
+    )
+    print(f"  grad norm into model:  {grad_norm:.6f}")
+    if grad_norm <= 0.0:
+        print("  RESULT: FAIL ❌ (no gradient flowed from SDPO loss)")
+        return 1
+    print("  RESULT: PASS ✅ (SDPO channel FIRED nonzero via real collator indices)")
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())