Codeseys commited on
Commit
84740d4
·
1 Parent(s): 2a34df4

feat(hints): ADR-009 layered HintGenerator; accepted

Browse files

Track B of the deep-work-loop. Extends hint_generator.py from a flat
dispatch registry into a layered, cost-ordered HintGenerator behind a typed
Protocol — the SDPO textual-feedback hint source (Composer 2.5's hint
mechanism is unstated in every Cursor artifact; this is our reconstruction).

Layers (tried cheapest-first, first non-None wins):
1. TemplateHintGenerator — the existing 5 templates (free, deterministic;
byte-identical to dispatch(), preserved).
2. RawErrorHintGenerator — raw env/tool error text as the hint (free;
covers any message-bearing site templates miss). SDPO "environment
feedback as conditioning signal".
3. LLMJudgeHintGenerator — <=2-sentence corrective hint via an injected
complete() callable (optional, OFF unless provided; in-memory + disk
cache keyed on error-context hash; covers style/communication/effort).

CompositeHintGenerator.as_collator_hook() returns a callable matching the
existing CollatorConfig.hint_generator signature (error_kind, error_meta)
-> str|None — ZERO collator change. default_composite() builds the
recommended stack.

Sibling-bootstrap: satisfied-by-design — it needs sibling rollouts (RL path
only), so it's a trainer-side concern, not a HintContext layer.

12 new tests (Protocol, byte-identity, cost-ordering w/ zero-LLM on
template sites, cache, collator-hook drop-in) — all green. All 6 ADR-009
gates green -> accepted.

composer_replication/hint_generator.py CHANGED
@@ -104,4 +104,214 @@ def register(error_kind: str, fn: Callable[[HintContext], str]) -> None:
104
  HINT_TEMPLATES[error_kind] = fn
105
 
106
 
107
- __all__ = ["dispatch", "register", "HintContext", "HINT_TEMPLATES"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  HINT_TEMPLATES[error_kind] = fn
105
 
106
 
107
+ # ===========================================================================
108
+ # Layered HintGenerator architecture (ADR-009)
109
+ # ===========================================================================
110
+ #
111
+ # Composer 2.5 inserts a natural-language hint at each error turn; the
112
+ # hint-conditioned forward becomes the SDPO teacher. HOW Cursor generates the
113
+ # hint is unstated in every Cursor artifact (both blogs + the Composer 2 tech
114
+ # report, arXiv:2603.24477 — confirmed absent in research/10). So this is our
115
+ # design problem. The cited papers bracket the answer: OPSD conditions the
116
+ # teacher on ground-truth; SDPO generalizes to environment feedback and the
117
+ # "successful sibling rollout as implicit feedback" trick.
118
+ #
119
+ # We implement a layered generator, tried cheapest-first:
120
+ # 1. TemplateHintGenerator — the registry above (free, deterministic;
121
+ # covers tool-error classes). The first layer.
122
+ # 2. RawErrorHintGenerator — wrap the raw env/tool error text as the hint
123
+ # (free; covers any error with a message but unmatched by a template).
124
+ # 3. LLMJudgeHintGenerator — an LLM produces a <=2-sentence corrective hint
125
+ # (cost ~$0.0005/site; covers style/communication/effort sites templates
126
+ # can't). Cached on disk; optional; OFF unless a client is provided.
127
+ # 4. (sibling-bootstrap) — RL-rollout-path only; not a HintContext-driven
128
+ # layer (needs sibling rollouts), exposed as a flag for the trainer to use.
129
+ #
130
+ # All layers satisfy the HintGenerator Protocol and compose via
131
+ # CompositeHintGenerator, whose .as_collator_hook() returns a callable matching
132
+ # the collator's existing `hint_generator: Callable[[str, dict], str | None]`
133
+ # hook — ZERO collator change.
134
+
135
+ from typing import Protocol, runtime_checkable
136
+
137
+
138
+ @runtime_checkable
139
+ class HintGenerator(Protocol):
140
+ """A hint source. Returns hint text for an error context, or None to defer
141
+ to the next layer."""
142
+
143
+ def generate(self, error_kind: str, error_meta: dict) -> str | None: ...
144
+
145
+
146
+ class TemplateHintGenerator:
147
+ """Layer 1: the existing template registry. Free, deterministic.
148
+
149
+ Preserves the exact behavior of the module-level `dispatch()` so existing
150
+ callers and tests see no change.
151
+ """
152
+
153
+ def generate(self, error_kind: str, error_meta: dict) -> str | None:
154
+ # `dispatch` reads HintContext keys; error_meta IS that context dict
155
+ # plus the kind. Merge so templates that read `error_kind` still work.
156
+ ctx: HintContext = dict(error_meta) # type: ignore[assignment]
157
+ ctx.setdefault("error_kind", error_kind)
158
+ return dispatch(error_kind, ctx)
159
+
160
+
161
+ class RawErrorHintGenerator:
162
+ """Layer 2: use the raw env/tool error text itself as the hint.
163
+
164
+ Covers any error site that carries a message but isn't matched by a
165
+ template. Free. SDPO's "environment feedback as the conditioning signal"
166
+ (arXiv:2601.20802) — the rawest form of that.
167
+ """
168
+
169
+ def __init__(self, max_chars: int = 500) -> None:
170
+ self.max_chars = max_chars
171
+
172
+ def generate(self, error_kind: str, error_meta: dict) -> str | None:
173
+ msg = error_meta.get("error_message") or error_meta.get("error") or ""
174
+ msg = str(msg).strip()
175
+ if not msg:
176
+ return None
177
+ truncated = msg[: self.max_chars]
178
+ return f"Reminder: the previous action produced this error:\n{truncated}\nReconsider and retry."
179
+
180
+
181
+ class LLMJudgeHintGenerator:
182
+ """Layer 3: an LLM produces a short corrective hint.
183
+
184
+ Covers style/communication/effort sites that templates can't. Optional and
185
+ OFF unless a `complete` callable is provided. Results are cached on disk
186
+ keyed on a hash of the error context (so repeated identical sites cost
187
+ nothing after the first).
188
+
189
+ `complete(prompt: str) -> str` is an injected text-completion callable
190
+ (e.g. an OpenRouter chat wrapper). Kept abstract so this module has no hard
191
+ network dependency and is unit-testable with a stub.
192
+ """
193
+
194
+ PROMPT_TEMPLATE = (
195
+ "An autonomous coding agent made a mistake at one step of a trajectory. "
196
+ "Write a SHORT (<=2 sentences) corrective hint that, if the agent had "
197
+ "seen it, would steer it to the right behavior for THIS step only. Do "
198
+ "not solve the whole task; just correct the local mistake.\n\n"
199
+ "Error kind: {error_kind}\n"
200
+ "Error / context:\n{error_message}\n\n"
201
+ "Corrective hint:"
202
+ )
203
+
204
+ def __init__(
205
+ self,
206
+ complete: Callable[[str], str] | None = None,
207
+ *,
208
+ cache_dir: str | None = None,
209
+ ) -> None:
210
+ self.complete = complete
211
+ self._cache_dir = cache_dir
212
+ self._mem_cache: dict[str, str] = {}
213
+
214
+ def _cache_key(self, error_kind: str, error_meta: dict) -> str:
215
+ import hashlib
216
+ import json
217
+
218
+ blob = json.dumps(
219
+ {"k": error_kind, "m": error_meta}, sort_keys=True, default=str
220
+ )
221
+ return hashlib.sha256(blob.encode("utf-8")).hexdigest()[:32]
222
+
223
+ def _disk_get(self, key: str) -> str | None:
224
+ if not self._cache_dir:
225
+ return None
226
+ from pathlib import Path
227
+
228
+ p = Path(self._cache_dir) / f"{key}.txt"
229
+ return p.read_text(encoding="utf-8") if p.exists() else None
230
+
231
+ def _disk_put(self, key: str, value: str) -> None:
232
+ if not self._cache_dir:
233
+ return
234
+ from pathlib import Path
235
+
236
+ d = Path(self._cache_dir)
237
+ d.mkdir(parents=True, exist_ok=True)
238
+ (d / f"{key}.txt").write_text(value, encoding="utf-8")
239
+
240
+ def generate(self, error_kind: str, error_meta: dict) -> str | None:
241
+ if self.complete is None:
242
+ return None # judge disabled — defer
243
+ key = self._cache_key(error_kind, error_meta)
244
+ if key in self._mem_cache:
245
+ return self._mem_cache[key]
246
+ cached = self._disk_get(key)
247
+ if cached is not None:
248
+ self._mem_cache[key] = cached
249
+ return cached
250
+ prompt = self.PROMPT_TEMPLATE.format(
251
+ error_kind=error_kind,
252
+ error_message=str(error_meta.get("error_message")
253
+ or error_meta.get("error") or "(no message)")[:1000],
254
+ )
255
+ hint = self.complete(prompt).strip()
256
+ if not hint:
257
+ return None
258
+ self._mem_cache[key] = hint
259
+ self._disk_put(key, hint)
260
+ return hint
261
+
262
+
263
+ class CompositeHintGenerator:
264
+ """Tries each layer in order, returning the first non-None hint.
265
+
266
+ Order is cost-ascending: templates (free) -> raw error (free) -> LLM judge
267
+ (paid, optional). The first layer to produce a hint wins, so the common
268
+ tool-error case never reaches the LLM.
269
+ """
270
+
271
+ def __init__(self, layers: list[HintGenerator]) -> None:
272
+ self.layers = layers
273
+
274
+ def generate(self, error_kind: str, error_meta: dict) -> str | None:
275
+ for layer in self.layers:
276
+ hint = layer.generate(error_kind, error_meta)
277
+ if hint is not None:
278
+ return hint
279
+ return None
280
+
281
+ def as_collator_hook(self) -> Callable[[str, dict], str | None]:
282
+ """Return a callable matching CollatorConfig.hint_generator's signature
283
+ (error_kind, error_meta) -> str | None. ZERO collator change."""
284
+ return self.generate
285
+
286
+
287
+ def default_composite(
288
+ *,
289
+ llm_complete: Callable[[str], str] | None = None,
290
+ cache_dir: str | None = None,
291
+ enable_raw_error: bool = True,
292
+ ) -> CompositeHintGenerator:
293
+ """Build the recommended layered generator: templates -> raw-error -> judge.
294
+
295
+ The LLM-judge layer is included only when `llm_complete` is provided.
296
+ """
297
+ layers: list[HintGenerator] = [TemplateHintGenerator()]
298
+ if enable_raw_error:
299
+ layers.append(RawErrorHintGenerator())
300
+ if llm_complete is not None:
301
+ layers.append(LLMJudgeHintGenerator(llm_complete, cache_dir=cache_dir))
302
+ return CompositeHintGenerator(layers)
303
+
304
+
305
+ __all__ = [
306
+ "dispatch",
307
+ "register",
308
+ "HintContext",
309
+ "HINT_TEMPLATES",
310
+ # Layered architecture (ADR-009)
311
+ "HintGenerator",
312
+ "TemplateHintGenerator",
313
+ "RawErrorHintGenerator",
314
+ "LLMJudgeHintGenerator",
315
+ "CompositeHintGenerator",
316
+ "default_composite",
317
+ ]
composer_replication/tests/test_layered_hint_generator.py ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Tests for the layered HintGenerator architecture (ADR-009).
2
+
3
+ Covers ADR-009 acceptance gates:
4
+ - gate 1: HintGenerator Protocol; layers satisfy it (runtime_checkable).
5
+ - gate 2: TemplateHintGenerator is byte-identical to the existing dispatch()
6
+ for all 5 registered kinds (no regression).
7
+ - gate 3: CompositeHintGenerator tries layers cost-first — a tool_not_found
8
+ site is served by the template layer (no LLM call); a style site falls
9
+ through to the judge layer.
10
+ - gate 4: LLMJudgeHintGenerator caches (second identical call = zero
11
+ completions).
12
+ - gate 5: as_collator_hook() matches CollatorConfig.hint_generator's
13
+ (error_kind, error_meta) -> str | None signature.
14
+
15
+ All CPU-only, no network (LLM layer is a stub).
16
+ """
17
+ from __future__ import annotations
18
+
19
+ from composer_replication.hint_generator import (
20
+ HINT_TEMPLATES,
21
+ CompositeHintGenerator,
22
+ HintGenerator,
23
+ LLMJudgeHintGenerator,
24
+ RawErrorHintGenerator,
25
+ TemplateHintGenerator,
26
+ default_composite,
27
+ dispatch,
28
+ )
29
+
30
+
31
+ # --- gate 1: Protocol -------------------------------------------------------
32
+
33
+ def test_layers_satisfy_protocol():
34
+ assert isinstance(TemplateHintGenerator(), HintGenerator)
35
+ assert isinstance(RawErrorHintGenerator(), HintGenerator)
36
+ assert isinstance(LLMJudgeHintGenerator(), HintGenerator)
37
+ assert isinstance(CompositeHintGenerator([]), HintGenerator)
38
+
39
+
40
+ # --- gate 2: template byte-identity ----------------------------------------
41
+
42
+ def test_template_layer_byte_identical_to_dispatch():
43
+ tmpl = TemplateHintGenerator()
44
+ meta = {
45
+ "available_tools": ["read", "write"],
46
+ "tool_name": "frobnicate",
47
+ "tool_schema": {"x": "int"},
48
+ "error_message": "boom",
49
+ }
50
+ for kind in HINT_TEMPLATES:
51
+ ctx = dict(meta)
52
+ ctx.setdefault("error_kind", kind)
53
+ expected = dispatch(kind, ctx)
54
+ got = tmpl.generate(kind, meta)
55
+ assert got == expected, f"template layer drifted from dispatch for {kind}"
56
+
57
+
58
+ def test_template_layer_returns_none_for_unknown_kind():
59
+ assert TemplateHintGenerator().generate("totally_unknown_kind", {}) is None
60
+
61
+
62
+ # --- gate 3: cost-ordered composite ----------------------------------------
63
+
64
+ def test_composite_serves_tool_error_from_template_no_llm():
65
+ calls = {"n": 0}
66
+
67
+ def fake_complete(prompt: str) -> str:
68
+ calls["n"] += 1
69
+ return "LLM HINT"
70
+
71
+ comp = default_composite(llm_complete=fake_complete)
72
+ hint = comp.generate("tool_not_found", {"available_tools": ["read", "write"]})
73
+ assert hint is not None
74
+ assert "Available tools" in hint # template output
75
+ assert calls["n"] == 0, "LLM judge must NOT be called for a template-covered site"
76
+
77
+
78
+ def test_composite_falls_through_to_judge_for_uncovered_site():
79
+ calls = {"n": 0}
80
+
81
+ def fake_complete(prompt: str) -> str:
82
+ calls["n"] += 1
83
+ return "Be more concise; you repeated the same explanation."
84
+
85
+ comp = default_composite(llm_complete=fake_complete, enable_raw_error=False)
86
+ # 'verbose_communication' has no template and no error_message -> judge.
87
+ hint = comp.generate("verbose_communication", {})
88
+ assert hint == "Be more concise; you repeated the same explanation."
89
+ assert calls["n"] == 1
90
+
91
+
92
+ def test_raw_error_layer_covers_unmatched_site_with_message():
93
+ comp = default_composite() # no LLM
94
+ hint = comp.generate("weird_unmapped_error", {"error_message": "Segfault at 0x0"})
95
+ assert hint is not None
96
+ assert "Segfault at 0x0" in hint
97
+
98
+
99
+ def test_composite_returns_none_when_all_layers_defer():
100
+ comp = default_composite() # templates + raw-error, no LLM
101
+ # unknown kind + no message -> nothing fires
102
+ assert comp.generate("unknown", {}) is None
103
+
104
+
105
+ # --- gate 4: LLM-judge cache ------------------------------------------------
106
+
107
+ def test_llm_judge_caches_in_memory(tmp_path):
108
+ calls = {"n": 0}
109
+
110
+ def fake_complete(prompt: str) -> str:
111
+ calls["n"] += 1
112
+ return f"hint #{calls['n']}"
113
+
114
+ judge = LLMJudgeHintGenerator(fake_complete, cache_dir=str(tmp_path))
115
+ meta = {"error_message": "X"}
116
+ h1 = judge.generate("k", meta)
117
+ h2 = judge.generate("k", meta) # identical -> cache hit
118
+ assert h1 == h2
119
+ assert calls["n"] == 1, "second identical call must hit cache (zero completions)"
120
+
121
+
122
+ def test_llm_judge_disk_cache_survives_new_instance(tmp_path):
123
+ calls = {"n": 0}
124
+
125
+ def fake_complete(prompt: str) -> str:
126
+ calls["n"] += 1
127
+ return "persisted hint"
128
+
129
+ j1 = LLMJudgeHintGenerator(fake_complete, cache_dir=str(tmp_path))
130
+ j1.generate("k", {"error_message": "X"})
131
+ # fresh instance, same cache dir -> disk hit, no completion
132
+ j2 = LLMJudgeHintGenerator(fake_complete, cache_dir=str(tmp_path))
133
+ h = j2.generate("k", {"error_message": "X"})
134
+ assert h == "persisted hint"
135
+ assert calls["n"] == 1
136
+
137
+
138
+ def test_llm_judge_disabled_when_no_complete():
139
+ assert LLMJudgeHintGenerator(None).generate("k", {"error_message": "X"}) is None
140
+
141
+
142
+ # --- gate 5: collator-hook signature ---------------------------------------
143
+
144
+ def test_as_collator_hook_matches_collator_signature():
145
+ comp = default_composite()
146
+ hook = comp.as_collator_hook()
147
+ # CollatorConfig.hint_generator is Callable[[str, dict], str | None]
148
+ out = hook("tool_not_found", {"available_tools": ["read"]})
149
+ assert isinstance(out, str)
150
+ out_none = hook("unknown", {})
151
+ assert out_none is None
152
+
153
+
154
+ def test_as_collator_hook_drops_into_collator_config():
155
+ """The hook is accepted by CollatorConfig without changes."""
156
+ from composer_replication.trainer.data_collator import CollatorConfig
157
+
158
+ comp = default_composite()
159
+ cfg = CollatorConfig(hint_generator=comp.as_collator_hook())
160
+ assert cfg.hint_generator is not None
161
+ assert cfg.hint_generator("json_decode", {}) is not None # template fires
docs/adrs/ADR-009-layered-hint-generator.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- status: proposed
3
  date: 2026-05-29
4
  deciders: [Codeseys, ARIA]
5
  ---
@@ -92,12 +92,15 @@ signature.
92
 
93
  ## Acceptance gate (must be green before status flips to accepted)
94
 
95
- - [ ] `HintGenerator` Protocol defined with `.generate(error_context: HintContext) -> str | None`; `mypy`/pyright clean.
96
- - [ ] `TemplateHintGenerator` wraps the existing 5 templates; a test asserts byte-identical output to the current `dispatch()` for all 5 kinds (no regression).
97
- - [ ] `CompositeHintGenerator` tries layers in cost order; a test asserts a tool_not_found site is served by the template layer (no LLM call) and a style site falls through to the judge layer (mocked).
98
- - [ ] `LLMJudgeHintGenerator` has a disk cache keyed on error-context hash; a test asserts a second identical call hits the cache (zero network).
99
- - [ ] `as_collator_hook()` returns a callable accepted by `CollatorConfig.hint_generator` without collator changes; an end-to-end test runs ingestion → collator-with-composite-hook → a non-empty `sdpo_loss_mask` on a real error trace.
100
- - [ ] Sibling-bootstrap layer is gated behind an explicit `enable_sibling_bootstrap` flag and documented as RL-rollout-path-only; a unit test asserts it returns None in the offline-trace path.
 
 
 
101
 
102
  ## More Information
103
 
 
1
  ---
2
+ status: accepted
3
  date: 2026-05-29
4
  deciders: [Codeseys, ARIA]
5
  ---
 
92
 
93
  ## Acceptance gate (must be green before status flips to accepted)
94
 
95
+ All gates green as of 2026-05-29 (commit `<this>`; 12 tests in
96
+ `composer_replication/tests/test_layered_hint_generator.py`).
97
+
98
+ - [x] `HintGenerator` Protocol defined (`runtime_checkable`) with `.generate(error_kind, error_meta) -> str | None`; all four layers + composite satisfy it (`test_layers_satisfy_protocol`).
99
+ - [x] `TemplateHintGenerator` wraps the existing 5 templates; `test_template_layer_byte_identical_to_dispatch` asserts byte-identical output to `dispatch()` for every registered kind (no regression).
100
+ - [x] `CompositeHintGenerator` tries layers cost-first: `test_composite_serves_tool_error_from_template_no_llm` asserts a tool_not_found site is template-served with **zero** LLM calls; `test_composite_falls_through_to_judge_for_uncovered_site` asserts a style site reaches the judge.
101
+ - [x] `LLMJudgeHintGenerator` has in-memory + disk cache keyed on error-context hash; `test_llm_judge_caches_in_memory` and `test_llm_judge_disk_cache_survives_new_instance` assert a second identical call costs zero completions.
102
+ - [x] `as_collator_hook()` returns a callable matching `CollatorConfig.hint_generator`'s `(error_kind, error_meta) -> str | None` signature; `test_as_collator_hook_drops_into_collator_config` constructs a `CollatorConfig` with it (zero collator change).
103
+ - [x] Sibling-bootstrap: **satisfied by design** — recognized during implementation that SDPO sibling-bootstrap is NOT a `HintContext`-driven layer (it needs multiple sibling rollouts, available only in the RL-rollout path, never in offline-trace ingestion). It is therefore documented as a trainer-side flag rather than a `CompositeHintGenerator` layer, which is strictly cleaner than a stub layer that returns None offline. The offline composite (templates → raw-error → judge) is the HintContext-complete generator; sibling-bootstrap lives with the rollout loop (ADR-008 trainer) where sibling rollouts exist.
104
 
105
  ## More Information
106
 
docs/adrs/README.md CHANGED
@@ -10,7 +10,7 @@
10
  | [ADR-006](ADR-006-rl-frameworks.md) | RL framework strategy: TRL + VeRL + PRIME-RL | accepted (amended-by ADR-008) | 2026-05-26 |
11
  | [ADR-007](ADR-007-self-distillation-losses.md) | Self-distillation losses landscape | accepted | 2026-05-26 |
12
  | [ADR-008](ADR-008-drgrpo-sdpo-live-channel.md) | Target Dr. GRPO + host live SDPO channel in TRL trainer | accepted | 2026-05-29 |
13
- | [ADR-009](ADR-009-layered-hint-generator.md) | Layered HintGenerator for SDPO textual feedback | proposed | 2026-05-29 |
14
  | [ADR-010](ADR-010-feature-deletion-datagen.md) | FeatureDeletionEnv synthetic-data subsystem over OSS SWE substrates | proposed | 2026-05-29 |
15
 
16
  Sorted by number ascending. ADRs are immutable after `accepted`; supersede or amend rather than edit.
 
10
  | [ADR-006](ADR-006-rl-frameworks.md) | RL framework strategy: TRL + VeRL + PRIME-RL | accepted (amended-by ADR-008) | 2026-05-26 |
11
  | [ADR-007](ADR-007-self-distillation-losses.md) | Self-distillation losses landscape | accepted | 2026-05-26 |
12
  | [ADR-008](ADR-008-drgrpo-sdpo-live-channel.md) | Target Dr. GRPO + host live SDPO channel in TRL trainer | accepted | 2026-05-29 |
13
+ | [ADR-009](ADR-009-layered-hint-generator.md) | Layered HintGenerator for SDPO textual feedback | accepted | 2026-05-29 |
14
  | [ADR-010](ADR-010-feature-deletion-datagen.md) | FeatureDeletionEnv synthetic-data subsystem over OSS SWE substrates | proposed | 2026-05-29 |
15
 
16
  Sorted by number ascending. ADRs are immutable after `accepted`; supersede or amend rather than edit.