Wave 11: cross-model adversarial review + honest down-revision

f16fa23 13 days ago

7.53 kB

	# ADR-002 — Trace source for Spike 007 (real LLM-application traces)

	Status: Accepted
	Date: 2026-05-26
	Wave: Phase 4 (deep work loop)

	## Context

	Spike 007 closes V5 of the vision validation: "real LLM-application traces."
	Spike 001 used 50 hand-crafted synthetic states for the cost-floor measurement.
	The framework's brief explicitly said real traces, so we owe Spike 007 a
	primary-sourced ingestion path that converts a real, public, multi-turn agent
	trace format into our existing `TraceState` TypedDict.

	Existing schema (verified from `spikes/005-integrated-trainer-skeleton/teacher_replay.py`):

	```python
	class TraceState(TypedDict):
	state_id: str # unique within the trace
	messages: list[dict] # OpenAI-style conversation up to + incl this step
	student_action: str # what the student did at this step
	```

	(Earlier deep-work-loop notes called this `TraceExample` — that was a brain
	glitch; the actual type is `TraceState` and there is no `TraceExample`.)

	## Options considered

	\| Option \| Schema \| Acquisition \| Signal density \| License \|
	\|---\|---\|---\|---\|---\|
	\| (a) Claude Code session JSONL \| Documented + 4 reverse-engineered schemas \| 1,015 local sessions zero-cost \| per-step `tool_use` blocks = ideal teacher-correction sites \| User-owned local files; framework MIT \|
	\| (b) Cline VS Code extension \| No stable export schema \| Would need custom extraction \| Unknown until extracted \| Apache 2.0 (extension), trace data user-owned \|
	\| (c) OpenHands trajectories \| Documented (v0/v1 in flux) \| Need to run OpenHands or download leaderboard submissions \| Strong \| MIT \|
	\| (d) Aider chat history \| Markdown chat (lossy for tool calls) \| Local only if user runs Aider \| Weak — collapses tool structure \| Apache 2.0 \|
	\| (e) SWE-bench leaderboard trajs \| Heterogeneous, free-format \| Public download \| Strong but uneven \| Per-submission (mostly permissive) \|
	\| (f) SWE-smith-trajectories (HF) \| Messages-only, structure collapsed \| HF dataset download \| Strong but lossy \| MIT \|

	Source: `docs/research/TRACE_SOURCE_RECONNAISSANCE.md` (2026-05-26 subagent recon).

	## Decision

	Option (a) — Claude Code session JSONL at `~/.claude/projects/<encoded>/<sessionid>.jsonl`.

	Wins on every axis we care about for Spike 007:

	1. Acquisition cost: zero. 1,015 real sessions already on this machine
	from the user's daily Claude Code use. No download, no consent
	negotiation, no rate limiting, no schema change risk during ingestion
	development.

	2. Schema stability: empirically validated. The subagent ran a programmatic
	audit on 8 real sessions; record types are stable across all of them.
	Anthropic publishes user-facing docs for the format; four independent
	community projects (claude-code-cli-tools, claudeflow, etc.) ship
	working parsers including one with a JSON Schema validated against
	~50,000 real messages.

	3. Signal density: maximal. Every `tool_use` block is a candidate
	teacher-correction site. The 5 pre-selected sessions in the recon doc
	contain 6,762 tool_use messages (range 125 → 2,830 per session). That's
	100× the density of Spike 001's 50 synthetic states.

	4. License: clean. The trace files are user-owned files on the user's
	own machine. We don't redistribute them with the framework. The
	ingester code we write is MIT and ships in the framework. Anyone
	running the framework who wants real-trace ingestion uses their own
	local Claude Code sessions.

	## Consequences

	### Accepted

	- Spike 007 implements `TraceIngester.ingest(path: Path) -> Iterator[TraceState]`
	for the Claude Code JSONL format.
	- The TraceIngester ships as part of the package (Wave 10 packaging) under
	`composer_replication.ingestion.claude_code`.
	- The recon doc's 5 pre-selected real sessions become the smoke fixture
	for Spike 007's tests. We pin to a known set of session IDs so the test
	is deterministic locally; CI users substitute their own.
	- `ingestion/` directory pattern is established now to support adding
	ingesters for OpenHands and SWE-smith later if Spike 007 reveals
	signal-density gaps.

	### Open questions resolved by ADR-002

	1. Granularity — One `TraceState` per assistant turn (not per `tool_use`).
	A single assistant turn often emits multiple `tool_use` blocks for one
	reasoning step; treating each tool_use as a separate state would
	over-fragment the conversation. Discussion in TRACE_SOURCE_RECONNAISSANCE
	§5.

	2. `student_action` mapping — The literal text of the assistant turn
	(concatenated `text` blocks of the Claude message) becomes
	`student_action`. The teacher-replay channel asks N teachers to produce
	their version of "what should the assistant do here?" given the
	`messages` history; we then DPO-compare teacher consensus vs literal
	student text.

	3. Thinking blocks — Strip `thinking` blocks from the message history
	passed to teachers (teachers don't have access to Claude's reasoning
	trace). KEEP them in the `student_action` for the student's own
	reproduction loop, since that's the actual generation we'd be RL-training.

	4. System prompt — Inject a synthetic system prompt at message[0] of
	each `TraceState` describing "you are a coding agent" so teachers
	without their own coding-agent system prompt have a fair playing field.

	5. Subagent traces — Skip them in v0.1; only ingest top-level sessions.
	Subagent traces have a different structure (parent task ID etc.) that
	would complicate the v0.1 ingester.

	### Recon-flagged risk (not blocking)

	- Anthropic doesn't publish a versioned schema. The TraceIngester pins to
	known record-types as of 2026-05-26 and gracefully degrades on unknown
	types. If Anthropic ships a breaking change to the JSONL format, we'd
	need to bump a `schema_version` constant in the ingester. Acceptable
	ongoing maintenance burden.

	### Risk added 2026-05-26 by cross-model review (NOT BLOCKING but TO DOCUMENT)

	- Circularity / data-leakage in the teacher-replay channel. Claude
	Code traces are produced by Claude. Our default teacher pool
	(`DEFAULT_TEACHERS`) includes `anthropic/claude-opus-4.7`. Training a
	student on Claude's outputs while Claude is one of the teachers
	voting on what the student should do produces a biased disagreement
	signal: Claude's vote is correlated with the trace's existing
	`student_action` (which Claude originally produced). This biases the
	multi-teacher consensus toward the existing answer.
	- Mitigation: when ingesting Claude Code traces, the user should
	drop Claude from the teacher pool and use a non-Claude consensus
	(Opus 4.7 → GPT-5 + DeepSeek V4-Pro, or any non-Claude pair).
	Documented here; not yet enforced in code.
	- Open question for v0.2: should `ClaudeCodeIngester` automatically
	annotate the source-model field on each trace and `replay_trace`
	automatically exclude same-family teachers? Defer the design until
	the post-replication phase reveals whether the bias is observable.

	### Future ingesters

	Open the door for two more ingesters in v0.2:
	- `composer_replication.ingestion.openhands` — for users who run OpenHands
	- `composer_replication.ingestion.swe_smith` — for users who download the HF dataset

	Both follow the same `Iterator[TraceState]` contract.

	## Source

	`docs/research/TRACE_SOURCE_RECONNAISSANCE.md` (subagent recon, primary-sourced
	including direct inspection of the user's local sessions, 2026-05-26).