Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
ADR-002 — Trace source for Spike 007 (real LLM-application traces)
Status: Accepted Date: 2026-05-26 Wave: Phase 4 (deep work loop)
Context
Spike 007 closes V5 of the vision validation: "real LLM-application traces."
Spike 001 used 50 hand-crafted synthetic states for the cost-floor measurement.
The framework's brief explicitly said real traces, so we owe Spike 007 a
primary-sourced ingestion path that converts a real, public, multi-turn agent
trace format into our existing TraceState TypedDict.
Existing schema (verified from spikes/005-integrated-trainer-skeleton/teacher_replay.py):
class TraceState(TypedDict):
state_id: str # unique within the trace
messages: list[dict] # OpenAI-style conversation up to + incl this step
student_action: str # what the student did at this step
(Earlier deep-work-loop notes called this TraceExample — that was a brain
glitch; the actual type is TraceState and there is no TraceExample.)
Options considered
| Option | Schema | Acquisition | Signal density | License |
|---|---|---|---|---|
| (a) Claude Code session JSONL | Documented + 4 reverse-engineered schemas | 1,015 local sessions zero-cost | per-step tool_use blocks = ideal teacher-correction sites |
User-owned local files; framework MIT |
| (b) Cline VS Code extension | No stable export schema | Would need custom extraction | Unknown until extracted | Apache 2.0 (extension), trace data user-owned |
| (c) OpenHands trajectories | Documented (v0/v1 in flux) | Need to run OpenHands or download leaderboard submissions | Strong | MIT |
| (d) Aider chat history | Markdown chat (lossy for tool calls) | Local only if user runs Aider | Weak — collapses tool structure | Apache 2.0 |
| (e) SWE-bench leaderboard trajs | Heterogeneous, free-format | Public download | Strong but uneven | Per-submission (mostly permissive) |
| (f) SWE-smith-trajectories (HF) | Messages-only, structure collapsed | HF dataset download | Strong but lossy | MIT |
Source: docs/research/TRACE_SOURCE_RECONNAISSANCE.md (2026-05-26 subagent recon).
Decision
Option (a) — Claude Code session JSONL at ~/.claude/projects/<encoded>/<sessionid>.jsonl.
Wins on every axis we care about for Spike 007:
Acquisition cost: zero. 1,015 real sessions already on this machine from the user's daily Claude Code use. No download, no consent negotiation, no rate limiting, no schema change risk during ingestion development.
Schema stability: empirically validated. The subagent ran a programmatic audit on 8 real sessions; record types are stable across all of them. Anthropic publishes user-facing docs for the format; four independent community projects (claude-code-cli-tools, claudeflow, etc.) ship working parsers including one with a JSON Schema validated against ~50,000 real messages.
Signal density: maximal. Every
tool_useblock is a candidate teacher-correction site. The 5 pre-selected sessions in the recon doc contain 6,762 tool_use messages (range 125 → 2,830 per session). That's 100× the density of Spike 001's 50 synthetic states.License: clean. The trace files are user-owned files on the user's own machine. We don't redistribute them with the framework. The ingester code we write is MIT and ships in the framework. Anyone running the framework who wants real-trace ingestion uses their own local Claude Code sessions.
Consequences
Accepted
- Spike 007 implements
TraceIngester.ingest(path: Path) -> Iterator[TraceState]for the Claude Code JSONL format. - The TraceIngester ships as part of the package (Wave 10 packaging) under
composer_replication.ingestion.claude_code. - The recon doc's 5 pre-selected real sessions become the smoke fixture for Spike 007's tests. We pin to a known set of session IDs so the test is deterministic locally; CI users substitute their own.
ingestion/directory pattern is established now to support adding ingesters for OpenHands and SWE-smith later if Spike 007 reveals signal-density gaps.
Open questions resolved by ADR-002
Granularity — One
TraceStateper assistant turn (not pertool_use). A single assistant turn often emits multipletool_useblocks for one reasoning step; treating each tool_use as a separate state would over-fragment the conversation. Discussion in TRACE_SOURCE_RECONNAISSANCE §5.student_actionmapping — The literal text of the assistant turn (concatenatedtextblocks of the Claude message) becomesstudent_action. The teacher-replay channel asks N teachers to produce their version of "what should the assistant do here?" given themessageshistory; we then DPO-compare teacher consensus vs literal student text.Thinking blocks — Strip
thinkingblocks from the message history passed to teachers (teachers don't have access to Claude's reasoning trace). KEEP them in thestudent_actionfor the student's own reproduction loop, since that's the actual generation we'd be RL-training.System prompt — Inject a synthetic system prompt at message[0] of each
TraceStatedescribing "you are a coding agent" so teachers without their own coding-agent system prompt have a fair playing field.Subagent traces — Skip them in v0.1; only ingest top-level sessions. Subagent traces have a different structure (parent task ID etc.) that would complicate the v0.1 ingester.
Recon-flagged risk (not blocking)
- Anthropic doesn't publish a versioned schema. The TraceIngester pins to
known record-types as of 2026-05-26 and gracefully degrades on unknown
types. If Anthropic ships a breaking change to the JSONL format, we'd
need to bump a
schema_versionconstant in the ingester. Acceptable ongoing maintenance burden.
Risk added 2026-05-26 by cross-model review (NOT BLOCKING but TO DOCUMENT)
- Circularity / data-leakage in the teacher-replay channel. Claude
Code traces are produced by Claude. Our default teacher pool
(
DEFAULT_TEACHERS) includesanthropic/claude-opus-4.7. Training a student on Claude's outputs while Claude is one of the teachers voting on what the student should do produces a biased disagreement signal: Claude's vote is correlated with the trace's existingstudent_action(which Claude originally produced). This biases the multi-teacher consensus toward the existing answer.- Mitigation: when ingesting Claude Code traces, the user should drop Claude from the teacher pool and use a non-Claude consensus (Opus 4.7 → GPT-5 + DeepSeek V4-Pro, or any non-Claude pair). Documented here; not yet enforced in code.
- Open question for v0.2: should
ClaudeCodeIngesterautomatically annotate the source-model field on each trace andreplay_traceautomatically exclude same-family teachers? Defer the design until the post-replication phase reveals whether the bias is observable.
Future ingesters
Open the door for two more ingesters in v0.2:
composer_replication.ingestion.openhands— for users who run OpenHandscomposer_replication.ingestion.swe_smith— for users who download the HF dataset
Both follow the same Iterator[TraceState] contract.
Source
docs/research/TRACE_SOURCE_RECONNAISSANCE.md (subagent recon, primary-sourced
including direct inspection of the user's local sessions, 2026-05-26).