Codeseys

Wave 15: 4-angle multi-model self-critique caught 2 math BLOCKERs in primary loss kernels; fixed against upstream byte-for-byte + GSM8K example + ergonomics

e5add15 12 days ago

preview code

raw

history blame contribute delete

12.4 kB

	# V1–V8 Coverage Matrix — Composer 2.5 Replication Framework

	This document maps each of the 8 clauses of the original brief to **the
	runnable artifact** (or honest gap) in this repo as of HEAD.

	The brief, decomposed:

	> [V1] dive into Composer 2.5 and understand what makes it so much better
	> [V2] take that and combine it with diloco (decoupled, open, any variant of diloco)
	> [V3] and monarch/torchforge/openenv/VeRL/TRL
	> [V4] and make a framework that we can use to further RL training of models to take them to the next level
	> [V5] One of the ideas that I had that might be a parallel to this is to use traces from an llm-application usage then replay the traces with different models to see at each llm-step what the llm would do
	> [V6] by doing this we get distillation data from any number of models that could be used to train the target model further
	> [V7] can we research all of this and see how we could try to set this up as a framework
	> [V8] to take any model from huggingface and be able to further RL train it to get results to Composer 2.5 which is post-trained kimi-k2.5

	## Coverage at-a-glance

	\| Clause \| Status \| Headline artifact \| Notes \|
	\|---\|---\|---\|---\|
	\| V1 \| ✅ Closed \| `research/01-composer-2.5.md` + `docs/COMPOSER_RECIPE_MAPPING.md` + Spike 005 trainer skeleton \| Identified SDPO/OPSD as Composer's secret sauce; traced to arXiv:2601.20802 (ICLR 2026); audited `siyan-zhao/OPSD` (MIT) for the loss kernel; lifted `generalized_jsd_loss` into our framework as `composer_replication.opsd.generalized_jsd_loss`. \|
	\| V2 \| ⚠️ Partial \| `composer_replication.diloco.make_diloco_outer_loop` wraps `torchft.local_sgd.DiLoCo` (BSD-3) \| Spike 008 verifies the outer-loop machinery + sign-convention on 1 replica. Cross-replica convergence is GPU-multi-process and not yet attempted. ADR-003 documents the choice. Wrapper is not yet integrated with `ComposerReplicationTrainer` — it's an independent context manager. \|
	\| V3 \| ✅ Closed (research + recipes) \| See § "V3 substrate coverage" below \| Each substrate has a research deep-dive + an integration recipe. TRL has working code; VeRL has a config + adv-estimator skeleton; Monarch/TorchForge/OpenEnv are documented as reference patterns per the brief's "research" framing. \|
	\| V4 \| ✅ Closed (installable) \| `pip install -e .` ships `composer_replication` package \| `pyproject.toml` at repo root; `examples/qwen_05b_quickstart/` runs end-to-end. The package re-exports the verified APIs from spike directories (loss, batch, opsd, teacher_replay, ingestion, trainer, diloco). \|
	\| V5 \| ✅ Closed \| `composer_replication.ingestion.ClaudeCodeIngester` + Spike 007 e2e test \| Real Claude Code session JSONL → `TraceState` → `compose_loss` end-to-end smoke. ADR-002 documents the source choice + Claude Code circularity risk. 18 tests passing (15 unit + 3 e2e-with-loss). \|
	\| V6 \| ✅ Closed \| `composer_replication.teacher_replay.replay_trace` + Spike 001 verdict \| Multi-teacher OpenRouter replay measured at $0.98/50-step trace, p95 latency 20.5s, 0 errors over 150 calls. Distillation data shape is `DPOPair(state_id, state_messages, chosen, rejected, n_teachers_agreeing)`. \|
	\| V7 \| ✅ Closed \| 5 research deep-dives + ADRs + integration architecture + working framework \| The "research and see how" question is empirically answered: framework built, primary-source-validated, four production extension paths documented. Process is auditable. \|
	\| V8 \| ⚠️ Partial \| Spike 006 (CPU smoke) + Spike 002a-mini (GPU smoke) \| Real `Qwen2.5-0.5B-Instruct` loads via `AutoModelForCausalLM`, runs through the 3-channel loss on both CPU (Spike 006) and GPU (Spike 002a-mini, RTX 5090, bf16, 5.3 GB peak VRAM, 480ms/step). The "Composer 2.5-quality results" half of V8 is GPU-budget-gated post-replication work (Spikes 002b/003/004). \|

	Tally: 6/8 closed, 2/8 partial. Both partials (V2 multi-process DiLoCo, V8 quality-of-results) are gated on GPU-multi-process work that is out of scope for the CPU-budget deep-work-loop phase.

	---

	## V3 substrate coverage (detailed)

	V3 names six substrates: monarch, torchforge, openenv, VeRL, TRL (plus DiLoCo from V2). Each has a deep-dive research doc and an integration recipe. The "framework" target lives at the intersection of all of them.

	\| Substrate \| Research deep-dive \| Integration recipe \| Working code \| Notes \|
	\|---\|---\|---\|---\|---\|
	\| TRL (huggingface/trl) \| `research/04-verl-trl.md` § 3 \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe A \| ✅ `composer_replication.trainer.ComposerReplicationTrainer` subclasses `GRPOTrainer`. `_compute_loss` override composes 3 channels. \| Production target for v0.1. DeepWiki-audited extension point: `GRPOTrainer._compute_loss(model, inputs)`. \|
	\| VeRL (volcengine/verl) \| `research/04-verl-trl.md` § 4 \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe B \| 🟡 `spikes/005/verl_path/composer_adv.py` (110 LOC) + `composer_config.yaml` (89 LOC). Skeleton, not yet runnable. \| Production target for v0.2 scale (multi-node). Extension point: `@register_adv_est(name)` decorator + `DataProto.batch`/`non_tensor_batch` for extra fields. \|
	\| DiLoCo (meta-pytorch/torchft) \| `research/02-diloco-family.md` (full DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 audit) \| `docs/adrs/ADR-003-diloco-impl.md` \| 🟡 `composer_replication.diloco.make_diloco_outer_loop` wraps `torchft.local_sgd.DiLoCo` (BSD-3). Spike 008 has 5 single-process tests including sign-convention pin. \| Multi-replica convergence not yet tested — single-process post-hook sequencing prevents this in CPU-only smoke. Real `torch.distributed` test deferred to GPU phase. \|
	\| OpenEnv \| `research/03-monarch-torchforge-openenv.md` § OpenEnv \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe D \| 📋 Reference pattern, no code \| Per the integration doc: "OpenEnv is a substrate, not a choice — it specifies how environments expose themselves to trainers." TRL accepts `environment_factory=` kwarg; VeRL has equivalent. Not a code dependency for v0.1; the framework's data path is OpenEnv-compatible by virtue of using TRL's API. \|
	\| Monarch (Meta) \| `research/03-monarch-torchforge-openenv.md` § Monarch \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe C \| 📋 Reference pattern \| Monarch is Meta's actor mesh — a coordination layer for distributed workers, not an algorithm. Per the research doc: "Monarch is alive, TorchForge is paused" (as of 2026-Q2). The framework's outer-loop sync via DiLoCo is an alternative coordination model that doesn't need Monarch. \|
	\| TorchForge (Meta, paused) \| `research/03-monarch-torchforge-openenv.md` § TorchForge \| n/a (paused upstream) \| 📋 Reference only \| TorchForge as a project was paused by Meta. Research doc captures the design lessons; no code dependency. \|

	Honest read: TRL + VeRL + DiLoCo are the three substrates the framework actually integrates with. Monarch/TorchForge/OpenEnv are documented as informed-design context, which is what the brief asked for ("can we research all of this and see how we could try to set this up").

	---

	## Status definitions

	- ✅ Closed: a runnable artifact exists, has tests, and is documented.
	- ⚠️ Partial: closed in the literal sense but with documented spirit-gaps; concrete next-step is identified.
	- ❌ Open: documented but no runnable artifact.
	- 📋 Reference: research-only by design (e.g. paused upstream projects, substrates that the brief asked for as research not code).

	---

	## What "Composer 2.5 quality" specifically requires (V8 honest)

	To close V8 in spirit, not just letter, the framework needs:

	1. ✅ The architecture — done. Three-channel loss with TRL/VeRL recipes; SDPO via OPSD; trace-replay via OpenRouter.
	2. ✅ Real model + real GPU — done. Spike 002a-mini on 5090 sm_120, bf16, 50 steps.
	3. ❌ Real teacher rollouts at scale — Spike 002b: collect ~1000 traces × 3 teachers = ~$1000 OpenRouter spend. GPU-budget gated.
	4. ❌ A/B against plain GRPO on SWE-bench-lite — Spike 004. ~$100-200 GPU + judge calls.
	5. ❌ Decisive empirical result — only achievable after (3) and (4).

	This is the post-replication phase. The CPU-only deep-work-loop phase (Waves 7-12) closes the architecture + installability + verification legs. The empirical leg requires money + time + a 7B+ model and is intentionally out of scope for the methodology phase.

	---

	## How to verify each ✅ yourself

	\| Clause \| Verification command \|
	\|---\|---\|
	\| V1 \| `cat research/01-composer-2.5.md docs/COMPOSER_RECIPE_MAPPING.md` \|
	\| V2 \| `cd spikes/008-streaming-diloco && python -m pytest tests/ -q` (5/5 pass) \|
	\| V3 \| `cat docs/INTEGRATION_ARCHITECTURE.md docs/V3_SUBSTRATE_COVERAGE.md` \|
	\| V4 \| `pip install -e . && python examples/qwen_05b_quickstart/run.py` \|
	\| V5 \| `cd spikes/007-real-trace-ingestion && python -m pytest tests/ -q` \|
	\| V6 \| `cat spikes/001-teacher-replay-cost/verdict.md` \|
	\| V7 \| `ls research/ docs/adrs/ docs/research/ docs/INTEGRATION_ARCHITECTURE.md` \|
	\| V8 \| `cd spikes/002a-mini-gpu-smoke && python run_gpu_smoke.py` (requires GPU) \|

	---

	## References

	- `docs/VISION_VALIDATION.md` — original 10-point scorecard + post-Wave-11 honest re-scoring
	- `docs/research/WAVE_7_10_FINAL_REVIEW.md` — cross-model adversarial review of Wave 7-10 (10 priority items, 2 BLOCKERs both addressed)
	- `docs/adrs/ADR-001..007` — seven architectural decisions (GPU venue, trace source, DiLoCo impl, replaysim normalization, serverless DiLoCo, RL frameworks, distillation losses)
	- `BACKLOG.md` — pre-execution acceptance criteria for Spikes 006/007/008 + Wave 10

	---

	## Wave 13 expansion (2026-05-26)

	The user expanded the brief mid-loop:

	> "keep going. make sure that we do the paths of the Composer 2.5 methods, the n-teachers replaysim, and Decoupled DiLoCo (so that we can leverage modal or huggingface-jobs or other serverless training systems). … For V5 see if we can leverage [a normalization library] to normalize the data while also making the replaysim dataset generation. … if we can properly document and research the self-distillation papers like SDPO OPDS and/or others. … see if there are other frameworks that are more popular that we could try to use. meta's pytorch agentic stack components are something that I'd like to explore."

	\| Wave 13 ask \| Artifact \| Status \|
	\|---\|---\|---\|
	\| Decoupled DiLoCo over serverless \| ADR-005 + `composer_replication.diloco.serverless` (Protocol + ObjectStoreAllReduce + LocalProcessExecutor + Modal/HFJobs skeletons) + 9 multi-process tests \| ✅ Closed (local) / 🟡 Skeleton (cloud) \|
	\| Replaysim normalization \| ADR-004 + `composer_replication.replaysim` package + `data-juicer` adapter + default YAML recipe + 9 unit tests \| ✅ Closed (passthrough) / 🟡 Pending data-juicer install for full path \|
	\| Other RL frameworks (V3 expansion) \| ADR-006 + `composer_replication.recipes.prime_rl` (recipe + composer_loss adapter + config.yaml) \| ✅ Closed (recipe) / 🟡 Skeleton (runtime) \|
	\| Meta's PyTorch agentic stack \| ADR-006 + `composer_replication.recipes.monarch` (actor layout doc + skeleton actors) \| ✅ Closed (design) / 🟡 Skeleton (impl) \|
	\| Deeper self-distillation research \| ADR-007 + `docs/research/SELF_DISTILLATION_LANDSCAPE.md` + `composer_replication.distillation` module (SimPO + TAID-rewritten + Entropy-Aware OPD) + tests \| ✅ Closed end-to-end — `compose_loss` kwargs wired in Wave 14; TAID rewritten in Wave 15 to match SakanaAI/TAID upstream (logit-space mix, current-student-detached anchor, forward-KL criterion, optional `TAIDScheduler`); OPSD parity test added against `siyan-zhao/OPSD` upstream. \|
	\| altered-minds tie-in \| `docs/ALTERED_MINDS_TIE_IN.md` (5-phase plan, $300 estimate, open questions) \| ✅ Closed (design) \|

	Wave 13 test addition: 35 new tests passing (17 distillation + 9 serverless multi-process + 9 replaysim).

	The framework now covers the full expanded brief. **Total tests passing
	post-Wave-15: 115 + 1 skip-marked.** Wave-by-wave evolution: 72 (W12) → 93 (W13) → 124 (W14) → 130 (W14b) → 115 (W15: TAID rewrite consolidated 16 schedule-tests into 7 t-parameterized tests; OPSD upstream-parity test added skip-marked).

	This is the canonical running test count; other docs reference V1_V8_COVERAGE rather than restating.

	# V1–V8 Coverage Matrix — Composer 2.5 Replication Framework

	This document maps each of the 8 clauses of the original brief to **the
	runnable artifact** (or honest gap) in this repo as of HEAD.

	The brief, decomposed:

	> [V1] dive into Composer 2.5 and understand what makes it so much better
	> [V2] take that and combine it with diloco (decoupled, open, any variant of diloco)
	> [V3] and monarch/torchforge/openenv/VeRL/TRL
	> [V4] and make a framework that we can use to further RL training of models to take them to the next level
	> [V5] One of the ideas that I had that might be a parallel to this is to use traces from an llm-application usage then replay the traces with different models to see at each llm-step what the llm would do
	> [V6] by doing this we get distillation data from any number of models that could be used to train the target model further
	> [V7] can we research all of this and see how we could try to set this up as a framework
	> [V8] to take any model from huggingface and be able to further RL train it to get results to Composer 2.5 which is post-trained kimi-k2.5

	## Coverage at-a-glance

	\| Clause \| Status \| Headline artifact \| Notes \|
	\|---\|---\|---\|---\|
	\| V1 \| ✅ Closed \| `research/01-composer-2.5.md` + `docs/COMPOSER_RECIPE_MAPPING.md` + Spike 005 trainer skeleton \| Identified SDPO/OPSD as Composer's secret sauce; traced to arXiv:2601.20802 (ICLR 2026); audited `siyan-zhao/OPSD` (MIT) for the loss kernel; lifted `generalized_jsd_loss` into our framework as `composer_replication.opsd.generalized_jsd_loss`. \|
	\| V2 \| ⚠️ Partial \| `composer_replication.diloco.make_diloco_outer_loop` wraps `torchft.local_sgd.DiLoCo` (BSD-3) \| Spike 008 verifies the outer-loop machinery + sign-convention on 1 replica. Cross-replica convergence is GPU-multi-process and not yet attempted. ADR-003 documents the choice. Wrapper is not yet integrated with `ComposerReplicationTrainer` — it's an independent context manager. \|
	\| V3 \| ✅ Closed (research + recipes) \| See § "V3 substrate coverage" below \| Each substrate has a research deep-dive + an integration recipe. TRL has working code; VeRL has a config + adv-estimator skeleton; Monarch/TorchForge/OpenEnv are documented as reference patterns per the brief's "research" framing. \|
	\| V4 \| ✅ Closed (installable) \| `pip install -e .` ships `composer_replication` package \| `pyproject.toml` at repo root; `examples/qwen_05b_quickstart/` runs end-to-end. The package re-exports the verified APIs from spike directories (loss, batch, opsd, teacher_replay, ingestion, trainer, diloco). \|
	\| V5 \| ✅ Closed \| `composer_replication.ingestion.ClaudeCodeIngester` + Spike 007 e2e test \| Real Claude Code session JSONL → `TraceState` → `compose_loss` end-to-end smoke. ADR-002 documents the source choice + Claude Code circularity risk. 18 tests passing (15 unit + 3 e2e-with-loss). \|
	\| V6 \| ✅ Closed \| `composer_replication.teacher_replay.replay_trace` + Spike 001 verdict \| Multi-teacher OpenRouter replay measured at $0.98/50-step trace, p95 latency 20.5s, 0 errors over 150 calls. Distillation data shape is `DPOPair(state_id, state_messages, chosen, rejected, n_teachers_agreeing)`. \|
	\| V7 \| ✅ Closed \| 5 research deep-dives + ADRs + integration architecture + working framework \| The "research and see how" question is empirically answered: framework built, primary-source-validated, four production extension paths documented. Process is auditable. \|
	\| V8 \| ⚠️ Partial \| Spike 006 (CPU smoke) + Spike 002a-mini (GPU smoke) \| Real `Qwen2.5-0.5B-Instruct` loads via `AutoModelForCausalLM`, runs through the 3-channel loss on both CPU (Spike 006) and GPU (Spike 002a-mini, RTX 5090, bf16, 5.3 GB peak VRAM, 480ms/step). The "Composer 2.5-quality results" half of V8 is GPU-budget-gated post-replication work (Spikes 002b/003/004). \|

	Tally: 6/8 closed, 2/8 partial. Both partials (V2 multi-process DiLoCo, V8 quality-of-results) are gated on GPU-multi-process work that is out of scope for the CPU-budget deep-work-loop phase.

	---

	## V3 substrate coverage (detailed)

	V3 names six substrates: monarch, torchforge, openenv, VeRL, TRL (plus DiLoCo from V2). Each has a deep-dive research doc and an integration recipe. The "framework" target lives at the intersection of all of them.

	\| Substrate \| Research deep-dive \| Integration recipe \| Working code \| Notes \|
	\|---\|---\|---\|---\|---\|
	\| TRL (huggingface/trl) \| `research/04-verl-trl.md` § 3 \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe A \| ✅ `composer_replication.trainer.ComposerReplicationTrainer` subclasses `GRPOTrainer`. `_compute_loss` override composes 3 channels. \| Production target for v0.1. DeepWiki-audited extension point: `GRPOTrainer._compute_loss(model, inputs)`. \|
	\| VeRL (volcengine/verl) \| `research/04-verl-trl.md` § 4 \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe B \| 🟡 `spikes/005/verl_path/composer_adv.py` (110 LOC) + `composer_config.yaml` (89 LOC). Skeleton, not yet runnable. \| Production target for v0.2 scale (multi-node). Extension point: `@register_adv_est(name)` decorator + `DataProto.batch`/`non_tensor_batch` for extra fields. \|
	\| DiLoCo (meta-pytorch/torchft) \| `research/02-diloco-family.md` (full DiLoCo / OpenDiLoCo / Streaming DiLoCo / PRIME-RL / INTELLECT-1+2 audit) \| `docs/adrs/ADR-003-diloco-impl.md` \| 🟡 `composer_replication.diloco.make_diloco_outer_loop` wraps `torchft.local_sgd.DiLoCo` (BSD-3). Spike 008 has 5 single-process tests including sign-convention pin. \| Multi-replica convergence not yet tested — single-process post-hook sequencing prevents this in CPU-only smoke. Real `torch.distributed` test deferred to GPU phase. \|
	\| OpenEnv \| `research/03-monarch-torchforge-openenv.md` § OpenEnv \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe D \| 📋 Reference pattern, no code \| Per the integration doc: "OpenEnv is a substrate, not a choice — it specifies how environments expose themselves to trainers." TRL accepts `environment_factory=` kwarg; VeRL has equivalent. Not a code dependency for v0.1; the framework's data path is OpenEnv-compatible by virtue of using TRL's API. \|
	\| Monarch (Meta) \| `research/03-monarch-torchforge-openenv.md` § Monarch \| `docs/INTEGRATION_ARCHITECTURE.md` Recipe C \| 📋 Reference pattern \| Monarch is Meta's actor mesh — a coordination layer for distributed workers, not an algorithm. Per the research doc: "Monarch is alive, TorchForge is paused" (as of 2026-Q2). The framework's outer-loop sync via DiLoCo is an alternative coordination model that doesn't need Monarch. \|
	\| TorchForge (Meta, paused) \| `research/03-monarch-torchforge-openenv.md` § TorchForge \| n/a (paused upstream) \| 📋 Reference only \| TorchForge as a project was paused by Meta. Research doc captures the design lessons; no code dependency. \|

	Honest read: TRL + VeRL + DiLoCo are the three substrates the framework actually integrates with. Monarch/TorchForge/OpenEnv are documented as informed-design context, which is what the brief asked for ("can we research all of this and see how we could try to set this up").

	---

	## Status definitions

	- ✅ Closed: a runnable artifact exists, has tests, and is documented.
	- ⚠️ Partial: closed in the literal sense but with documented spirit-gaps; concrete next-step is identified.
	- ❌ Open: documented but no runnable artifact.
	- 📋 Reference: research-only by design (e.g. paused upstream projects, substrates that the brief asked for as research not code).

	---

	## What "Composer 2.5 quality" specifically requires (V8 honest)

	To close V8 in spirit, not just letter, the framework needs:

	1. ✅ The architecture — done. Three-channel loss with TRL/VeRL recipes; SDPO via OPSD; trace-replay via OpenRouter.
	2. ✅ Real model + real GPU — done. Spike 002a-mini on 5090 sm_120, bf16, 50 steps.
	3. ❌ Real teacher rollouts at scale — Spike 002b: collect ~1000 traces × 3 teachers = ~$1000 OpenRouter spend. GPU-budget gated.
	4. ❌ A/B against plain GRPO on SWE-bench-lite — Spike 004. ~$100-200 GPU + judge calls.
	5. ❌ Decisive empirical result — only achievable after (3) and (4).

	This is the post-replication phase. The CPU-only deep-work-loop phase (Waves 7-12) closes the architecture + installability + verification legs. The empirical leg requires money + time + a 7B+ model and is intentionally out of scope for the methodology phase.

	---

	## How to verify each ✅ yourself

	\| Clause \| Verification command \|
	\|---\|---\|
	\| V1 \| `cat research/01-composer-2.5.md docs/COMPOSER_RECIPE_MAPPING.md` \|
	\| V2 \| `cd spikes/008-streaming-diloco && python -m pytest tests/ -q` (5/5 pass) \|
	\| V3 \| `cat docs/INTEGRATION_ARCHITECTURE.md docs/V3_SUBSTRATE_COVERAGE.md` \|
	\| V4 \| `pip install -e . && python examples/qwen_05b_quickstart/run.py` \|
	\| V5 \| `cd spikes/007-real-trace-ingestion && python -m pytest tests/ -q` \|
	\| V6 \| `cat spikes/001-teacher-replay-cost/verdict.md` \|
	\| V7 \| `ls research/ docs/adrs/ docs/research/ docs/INTEGRATION_ARCHITECTURE.md` \|
	\| V8 \| `cd spikes/002a-mini-gpu-smoke && python run_gpu_smoke.py` (requires GPU) \|

	---

	## References

	- `docs/VISION_VALIDATION.md` — original 10-point scorecard + post-Wave-11 honest re-scoring
	- `docs/research/WAVE_7_10_FINAL_REVIEW.md` — cross-model adversarial review of Wave 7-10 (10 priority items, 2 BLOCKERs both addressed)
	- `docs/adrs/ADR-001..007` — seven architectural decisions (GPU venue, trace source, DiLoCo impl, replaysim normalization, serverless DiLoCo, RL frameworks, distillation losses)
	- `BACKLOG.md` — pre-execution acceptance criteria for Spikes 006/007/008 + Wave 10

	---

	## Wave 13 expansion (2026-05-26)

	The user expanded the brief mid-loop:

	> "keep going. make sure that we do the paths of the Composer 2.5 methods, the n-teachers replaysim, and Decoupled DiLoCo (so that we can leverage modal or huggingface-jobs or other serverless training systems). … For V5 see if we can leverage [a normalization library] to normalize the data while also making the replaysim dataset generation. … if we can properly document and research the self-distillation papers like SDPO OPDS and/or others. … see if there are other frameworks that are more popular that we could try to use. meta's pytorch agentic stack components are something that I'd like to explore."

	\| Wave 13 ask \| Artifact \| Status \|
	\|---\|---\|---\|
	\| Decoupled DiLoCo over serverless \| ADR-005 + `composer_replication.diloco.serverless` (Protocol + ObjectStoreAllReduce + LocalProcessExecutor + Modal/HFJobs skeletons) + 9 multi-process tests \| ✅ Closed (local) / 🟡 Skeleton (cloud) \|
	\| Replaysim normalization \| ADR-004 + `composer_replication.replaysim` package + `data-juicer` adapter + default YAML recipe + 9 unit tests \| ✅ Closed (passthrough) / 🟡 Pending data-juicer install for full path \|
	\| Other RL frameworks (V3 expansion) \| ADR-006 + `composer_replication.recipes.prime_rl` (recipe + composer_loss adapter + config.yaml) \| ✅ Closed (recipe) / 🟡 Skeleton (runtime) \|
	\| Meta's PyTorch agentic stack \| ADR-006 + `composer_replication.recipes.monarch` (actor layout doc + skeleton actors) \| ✅ Closed (design) / 🟡 Skeleton (impl) \|
	\| Deeper self-distillation research \| ADR-007 + `docs/research/SELF_DISTILLATION_LANDSCAPE.md` + `composer_replication.distillation` module (SimPO + TAID-rewritten + Entropy-Aware OPD) + tests \| ✅ Closed end-to-end — `compose_loss` kwargs wired in Wave 14; TAID rewritten in Wave 15 to match SakanaAI/TAID upstream (logit-space mix, current-student-detached anchor, forward-KL criterion, optional `TAIDScheduler`); OPSD parity test added against `siyan-zhao/OPSD` upstream. \|
	\| altered-minds tie-in \| `docs/ALTERED_MINDS_TIE_IN.md` (5-phase plan, $300 estimate, open questions) \| ✅ Closed (design) \|

	Wave 13 test addition: 35 new tests passing (17 distillation + 9 serverless multi-process + 9 replaysim).

	The framework now covers the full expanded brief. **Total tests passing
	post-Wave-15: 115 + 1 skip-marked.** Wave-by-wave evolution: 72 (W12) → 93 (W13) → 124 (W14) → 130 (W14b) → 115 (W15: TAID rewrite consolidated 16 schedule-tests into 7 t-parameterized tests; OPSD upstream-parity test added skip-marked).

	This is the canonical running test count; other docs reference V1_V8_COVERAGE rather than restating.