Instructions to use Codeseys/composer-replication-framework with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Codeseys/composer-replication-framework with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Codeseys/composer-replication-framework", dtype="auto") - Notebooks
- Google Colab
- Kaggle
TRACE_SOURCE_RECONNAISSANCE.md
Spike 007 trace-source audit, feeding ADR-002.
Status: DECIDED — recommend (a) Claude Code session JSONL (~/.claude/projects/<encoded>/<sessionId>.jsonl).
0. TL;DR
Of the six candidates audited, Claude Code session JSONL wins on every axis except "official Anthropic-published schema" (no such doc exists), and for that single weakness there is now a community-maintained reverse-engineered JSON Schema validated against ~50,000 messages from real sessions, plus three independent third-party schema specs. The user has 1,015 .jsonl sessions on this machine today; the eight largest sampled span 550 → 17,315 lines and contain 6,762 multi-turn tool_use messages. Acquisition cost is zero. Licensing is clean: the JSONL files are local user-owned data; the proprietary Claude Code binary is not redistributed by us.
The runners-up — OpenHands (well-documented but acquisition is non-trivial), SWE-bench trajectory submissions (heterogeneous schemas across submitters), Aider markdown chat history (lossy / unparseable for tool calls), and Cline (no public stable export format) — each lose on at least one of the four axes.
1. Context: TraceExample dataclass field reality
Important correction to the parent task description. The task brief said "TraceExample dataclass with fields state_text, action_taken, hint_text (optional), reward (float), teacher_id (str)". Reading the actual file at
/mnt/e/CS/HF/composer-replication-framework/spikes/005-integrated-trainer-skeleton/teacher_replay.py shows the existing types are different — there is no TraceExample class. The closest existing types are two TypedDicts used by replay_trace() and extract_dpo_pairs():
class TraceState(TypedDict):
state_id: str # unique within the trace
messages: list[dict] # conversation up to and including this step's user prompt
student_action: str # what the student actually did at this step
class DPOPair(TypedDict):
state_id: str
state_messages: list[dict]
chosen: str # teacher-consensus action
rejected: str # student action
n_teachers_agreeing: int
The mapping sketch in §6 below targets TraceState (the input to teacher replay), since that is the type a TraceIngester is upstream of. If Spike 007 also wants a unified TraceExample per the brief, the natural shape is TraceState ∪ {teacher_id: str | None, reward: float | None, hint_text: str | None} — flagged for ADR-002 to settle.
2. Candidate audit summary
Scoring legend: + good, ~ mixed, - bad, on each of the four required axes.
| # | Candidate | Schema documented | Real ≥5 multi-turn traces | Hint-receptive signal density | License OK | Verdict |
|---|---|---|---|---|---|---|
| a | Claude Code JSONL (~/.claude/projects/) |
~ Anthropic publishes high-level format note; community schemas are detailed and validated |
+ 1,015 local sessions, 5+ trivially | + Per-step assistant.message.content[].tool_use blocks → discrete actions, ideal teacher-correction sites |
+ User-owned local files; framework MIT | CHOSEN |
| b | Cline VS Code extension | - No published stable export schema |
~ Requires running Cline + manual export |
~ Plausible if exported but unverified |
~ Cline source Apache-2.0 but trace format isn't a stable contract |
reject |
| c | OpenHands trajectories | + Well-documented (events/, base_state.json, Pydantic Event models) | - Need to run OpenHands or download eval traces — not zero-cost |
+ ActionEvent/ObservationEvent split is conceptually ideal | + OpenHands MIT-licensed | strong runner-up |
| d | Aider chat history | ~ Format is "markdown, level-4 headings for user input" — fragile |
~ Available if Aider was used |
- Tool calls are flattened into prose; recovering structured actions is lossy |
+ Aider Apache-2.0 |
reject |
| e | SWE-bench / Lite leaderboard trajs/ |
- Each submitter chooses a free-form text format (md/json/yaml) |
+ ~hundreds of submissions on github.com/swe-bench/experiments | ~ Heterogeneous; structured ones (e.g. mini-swe-agent .traj.json) are good, others are essentially logs |
+ Public submissions with usage rights for research | reject as primary; usable as future cross-validation set |
| f | SWE-smith-trajectories on HF | + Standard OpenAI messages format, documented per dataset card | + 5,017 trajectories, 76,002 rows, public | + Single-attempt per-instance SWE-agent runs | + Apache-2.0 dataset license | strong runner-up; complement, not replacement |
The (f) row was discovered during audit (the parent task allowed "any other public source you find that is better"). It's a strong candidate but answers a different question: SWE-bench trajectories give us reproducible benchmark traces; Claude Code JSONL gives us the user's actual workflow. For Spike 007's purpose (verify the teacher-replay path works on a real, signal-dense trace at zero acquisition cost), (a) is the right primary; (f) is queued for a later cross-validation phase.
3. Chosen format spec — Claude Code session JSONL
3.1 Location and naming
- Root:
~/.claude/projects/(overridable viaCLAUDE_CONFIG_DIR). Source: https://code.claude.com/docs/en/sessions ("Transcripts are stored as JSONL at~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl"). - Project-key encoding: working-directory absolute path with
/and\and:replaced by-, with a leading-. (Hidden directories with a leading dot become double dashes.) Source: https://github.com/jamie-bitflight/claude_skills/blob/main/plugins/agentskill-kaizen/skills/transcript-analysis/references/session-log-schema.md §"Project key encoding". - File:
<sessionId>.jsonl. Subagent transcripts areagent-<agentId>.jsonl; aSessionReadershould skip files starting withagent-when listing main sessions. Source: sameclaude_skillsdoc, §"Subagent File Location". - Encoding: UTF-8, newline-delimited JSON. One JSON object per line. No
[/]wrapping. Local cleanup default 30 days, configurable viacleanupPeriodDaysin~/.claude/settings.json. Source: https://code.claude.com/docs/en/data-usage ("Local caching: Claude Code clients store session transcripts locally in plaintext under~/.claude/projects/for 30 days by default to enable session resumption.")
3.2 Common record fields
Every record (both user and assistant types) carries:
| field | type | meaning |
|---|---|---|
parentUuid |
string | null |
UUID of the parent record (null on the first record) |
uuid |
string |
This record's UUID |
sessionId |
string |
UUID of the session (matches filename) |
timestamp |
string (ISO-8601) |
Wall-clock time of the record |
cwd |
string |
Absolute working directory |
version |
string |
Claude Code version (e.g. "2.1.143") |
gitBranch |
string |
Empty string "" when not in a git repo |
isSidechain |
boolean |
True for sub-agent (Task tool) chains |
userType |
string |
"external" or similar |
type |
string |
Discriminator — see §3.3 |
entrypoint |
string |
e.g. "sdk-cli" |
Sources for these fields:
- https://github.com/KyleAMathews/claude-code-ui/blob/main/spec.md §"Type Definitions" →
BaseMessageEntry - https://github.com/jamie-bitflight/claude_skills/blob/main/plugins/agentskill-kaizen/skills/transcript-analysis/references/session-log-schema.md §"Top-Level Record Fields"
- https://github.com/moru-ai/agent-schemas/blob/main/claude-code/v2.1.1/session.schema.json (machine-validated against ~50,000 messages from 480 real sessions)
- Direct inspection (this doc):
headof~/.claude/projects/-mnt-e-CS-HF-eidolon/c6967343-51a3-4b1b-9472-a569e96114b1.jsonlconfirms presence of every field above.
3.3 Record types (type discriminator)
type |
Role |
|---|---|
user |
Both human prompts AND tool results (distinguished by message.content[].type) |
assistant |
Model output: text, thinking, and tool_use blocks |
system |
Hook summaries, stop notices |
summary |
Context-compaction markers |
attachment |
Hook stdout/stderr, e.g. SessionStart hook output |
queue-operation |
Prompt enqueue/dequeue events |
file-history-snapshot |
File-state tracking for undo |
last-prompt |
Bookkeeping for resume |
Source: https://github.com/KyleAMathews/claude-code-ui/blob/main/spec.md §"Entry Types"; corroborated by direct Counter inspection of one local session showing attachment, assistant, user, last-prompt, queue-operation types in expected proportions.
3.4 The two record types we care about
Assistant record carrying a tool call (the "student action")
Real example, redacted from ~/.claude/projects/-mnt-e-CS-github-VIGOR--overstory-worktrees-builder-doc-adapter-skeleton/39df59f0-674c-413a-b333-cdac0cea9db7.jsonl:
{
"type": "assistant",
"uuid": "24a16a51-3133-4ba5-9d23-472864286154",
"parentUuid": "1b11c3b3-832b-4473-a944-b61a1f3f2594",
"sessionId": "39df59f0-…",
"timestamp": "2026-05-16T04:52:21.947Z",
"message": {
"role": "assistant",
"model": "claude-opus-4-7",
"content": [
{
"type": "tool_use",
"id": "toolu_bdrk_012HC2dggmSgtVAtWWzwikZq",
"name": "Bash",
"input": {
"command": "ov mail check --agent builder-doc-adapter-skeleton 2>&1 | head -200",
"description": "Check builder agent inbox"
}
}
],
"stop_reason": "tool_use",
"usage": { "input_tokens": 6, "cache_creation_input_tokens": 48287, "output_tokens": 1021, ... }
}
}
The student's action at this step = the JSON of message.content[i] where content[i].type == "tool_use" (or, if multiple tool_use blocks, the array of them; or if pure-text reply, the content[i].text of the text block).
User record carrying a tool result (the "observation")
{
"type": "user",
"uuid": "b9f9414b-…",
"parentUuid": "24a16a51-…", // matches the assistant uuid above
"sessionId": "39df59f0-…",
"timestamp": "2026-05-16T04:52:23.229Z",
"message": {
"role": "user",
"content": [
{
"tool_use_id": "toolu_bdrk_012HC2dggmSgtVAtWWzwikZq",
"type": "tool_result",
"content": " No new messages",
"is_error": false
}
]
},
"toolUseResult": { // duplicate, structured form
"stdout": " No new messages",
"stderr": "",
"interrupted": false,
"isImage": false,
"noOutputExpected": false
},
"sourceToolAssistantUUID": "24a16a51-…" // back-pointer to the assistant uuid
}
User records carrying actual human prompts have message.content as a list with {"type":"text","text":"..."} blocks (or, in older logs, message.content as a plain string).
3.5 Schema stability
- Anthropic's official documentation acknowledges the location and "each line is a JSON object for a message, tool use, or metadata entry" but does not publish a versioned schema.
- Practical stability: moru-ai/agent-schemas tracked v2.0.76 → v2.1.1; only one new field of note (
toolUseResult). Schema pinsadditionalProperties: truefor forward compatibility. This level of stability is sufficient for Spike 007 (a research spike, not a long-lived product API). - Mitigation: pin to a specific Claude Code
versionfield range and version-gate the ingester (e.g. accept2.1.x, warn on others).
3.6 Licensing
- The Claude Code binary is proprietary (Anthropic Commercial Terms of Service, https://github.com/anthropics/claude-code/blob/1e95326e12183286fc6cbd828c8a86a0d8e03c62/LICENSE.md).
- The session JSONL files are local user data generated on the user's machine during ordinary use. Anthropic's data-usage doc explicitly calls them "local caching … session transcripts locally in plaintext" — they belong to the user.
- Our framework is MIT-licensed and we are not redistributing the Claude Code binary or any third-party trace files. We are reading the user's own local logs (analogous to processing one's own
.bash_history). - We MUST NOT publish raw trace files in our repo without the user's consent (PII risk: cwd, gitBranch, file contents). The framework should ship only the ingester, plus a tiny synthetic-fixture trace for unit tests.
4. Acquiring the 5 real example traces
Zero acquisition cost. All five live on this machine right now.
Discovery command (used during this audit):
find ~/.claude/projects -name "*.jsonl" 2>/dev/null
# → 1015 files
Five concrete pre-selected sessions, each multi-turn (≥ 100 tool_use messages), each from a distinct project, each ≥ 50 KB:
| # | Tool-use msgs | User msgs | Asst msgs | Total lines | Path |
|---|---|---|---|---|---|
| 1 | 2,830 | 3,199 | 4,325 | 17,315 | /home/codeseys/.claude/projects/-mnt-e-CS-HF-eidolon/c6967343-51a3-4b1b-9472-a569e96114b1.jsonl |
| 2 | 1,350 | 1,407 | 2,016 | 7,673 | /home/codeseys/.claude/projects/-mnt-e-CS-github-agent-manager/c42b68ea-d410-455e-bc71-92ec6c4adce9.jsonl |
| 3 | 984 | 1,032 | 1,549 | 5,783 | /home/codeseys/.claude/projects/-mnt-e-CS-HF-streaming-speech-to-speech/73c9925c-d5e5-48fc-a97b-a58687c2fb3c.jsonl |
| 4 | 717 | 759 | 1,142 | 4,036 | /home/codeseys/.claude/projects/-mnt-e-CS-github/6ac8e20f-98ec-4279-9957-e68862a90c5e.jsonl |
| 5 | 125 | 126 | 197 | 629 | /home/codeseys/.claude/projects/-mnt-e-CS-github-VIGOR--overstory-worktrees-builder-iteration-checkpoint/e4a34e2b-40c6-49ce-b253-912a43224aae.jsonl |
(All five inspected programmatically during this audit — counts above are real, not estimates.)
For users on other machines: find ~/.claude/projects -name '*.jsonl' -size +50k | head will surface candidates. For repository CI we will commit a small (~5 KB) synthetic fixture conforming to the schema, never any of the user's real traces.
5. Decision-relevant tradeoffs vs runners-up
Why we are NOT picking OpenHands trajectories (c)
- Pro: cleanest schema we audited — Pydantic
Event/ActionEvent/ObservationEventmodels, source: https://docs.openhands.dev/sdk/arch/events, source code: https://github.com/OpenHands/OpenHands/blob/3ec999e8/openhands/events/serialization/event.py. Tool-call structure is more normalized than Claude Code's (explicit Action/Observation typing). - Con: zero-acquisition is false here. Persistence dir defaults to
workspace/conversations/and only exists if the user has run OpenHands locally. Public eval trajectories are spread across the eval/ folder rather than a clean public bucket. - Decisive: Spike 001's economic floor was measured on 50 synthetic states. Spike 007's purpose is to verify ingestion + replay on real traces that already exist. (a) gives that today; (c) requires standing up OpenHands first, plus the storage format split between v0 (per-event JSON files) and v1 (timestamped files) per https://github.com/All-Hands-AI/OpenHands/issues/8701, which is a flux risk.
- Future use: if the framework ever ships "trace ingester adapters" plural, OpenHands is the second adapter to write — its event-typed model is conceptually superior.
Why we are NOT picking SWE-bench leaderboard trajectories (e)
- Pro: hundreds of submissions on https://github.com/swe-bench/experiments, with required
trajs/folders. - Con: leaderboard rules say "The reasoning trace can be represented with any text based file format (e.g. md, json, yaml)" (source: https://github.com/swe-bench/experiments README). Each submitter picks their own. Building a generic ingester is a per-submission engineering project, not a single adapter. SWE-agent uses one shape (
{"action", "observation", "response"}arrays — confirmed via https://huggingface.co/datasets/JetBrains-Research/swe-traj-complete); mini-swe-agent uses.traj.jsonwith OpenAI messages format (https://huggingface.co/datasets/tarsur385/swebench-verified-trajectories). - Decisive: heterogeneous schema = fragile ingester = wrong choice for first spike.
Why we are NOT picking Aider (d)
- The
chat_history_fileis markdown (.aider.chat.history.md), per https://aider.chat/docs/config/dotenv.html. Source code at https://github.com/Aider-AI/aider/blob/bdb4d9ff/aider/history.py shows it's literallyf.write(text)of formatted prose with####for user input. - Decisive: tool calls in Aider are applied as edits, not preserved as discrete structured actions in the markdown log. Reconstructing "the action the student took at step k" is lossy. The
.aider.llm.historylog is closer to what we want but is opt-in and not always present.
Why we are NOT picking Cline (b)
- No public commitment to a stable export schema. Cline's storage is internal to the VS Code extension (workspace state DB + per-task JSON in extension storage). Searching for "Cline trace export schema" yields no Anthropic-style spec doc. Workable in principle, but reverse-engineering an extension's storage is not the right ground for a 1-week spike.
Why we are NOT picking SWE-smith-trajectories (f)
- This is the strongest external dataset we found and should be Spike 007's stretch goal / Spike 008's primary: 5,017 fine-tuning trajectories from SWE-agent + Claude 3.7 Sonnet, 4.22 GB on HuggingFace, OpenAI messages format. Source: https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories.
- Why not first: the messages-only format collapses tool calls and tool results into the OpenAI chat-completions wire format with text-encoded tool blocks. That works for SFT but is less signal-dense for the teacher-correction spike than Claude Code's
tool_useblocks because the model'snameandinputfields are structurally separated in Claude Code's format, making "did the teacher pick a different tool?" a one-line check.
6. TraceIngester sketch
Realised in v0.1 (Wave 17 update): The realised ingester ships at
composer_replication/ingestion/claude_code.pyexportingClaudeCodeIngester, with the spike atspikes/007-real-trace-ingestion/claude_code_ingester.py. The public production surface is:from pathlib import Path from composer_replication.ingestion.claude_code import ClaudeCodeIngester ingester = ClaudeCodeIngester(skip_sidechain=True, strip_thinking=True) for trace_state in ingester.ingest(Path("~/.claude/projects/.../session.jsonl").expanduser()): # trace_state matches the TraceState TypedDict from §1 ... stats = ingester.last_stats # IngestionStats — turn counts, skip reasonsThe shipped
ClaudeCodeIngesterdiffers from the pre-spike sketch below in:
- Class name:
ClaudeCodeIngester(notTraceIngester)- Module path:
composer_replication.ingestion.claude_code(notspikes/007-trace-ingester/trace_ingester.py)- The constructor takes config kwargs (
system_prompt,skip_sidechain,strip_thinking,max_history_tokens); paths are passed to.ingest(Path)per call instead of being held by the ingester- The yielded type is
TraceState(matches §1)The pre-spike sketch below is preserved as historical proposal context.
Drop-in adapter for spike-005's replay_trace(). Targets TraceState (the actual existing TypedDict; see §1).
# spikes/007-trace-ingester/trace_ingester.py
from __future__ import annotations
import json
from collections.abc import Iterator
from pathlib import Path
from typing import Any
# Re-use the existing TypedDicts from spike-005:
# from spikes.005_integrated_trainer_skeleton.teacher_replay import TraceState
# A "step" in the trace is each assistant record that ends in tool_use. The
# state visible to the model at that step = all messages strictly before it,
# in OpenAI/Anthropic chat format. The student_action = the tool_use payload(s).
def _record_to_chat_message(rec: dict) -> dict | None:
"""Turn one Claude Code JSONL record into an OpenAI/Anthropic chat-message
dict, or return None for non-conversational records (queue-operation,
attachment, file-history-snapshot, system, last-prompt, summary)."""
t = rec.get("type")
if t not in ("user", "assistant"):
return None
msg = rec.get("message")
if not isinstance(msg, dict):
return None
role = msg.get("role")
content = msg.get("content")
if role not in ("user", "assistant") or content is None:
return None
# Strip thinking blocks — they are not portable across teacher models and
# should not influence the teacher's decision at replay time.
if isinstance(content, list):
content = [c for c in content
if not (isinstance(c, dict) and c.get("type") == "thinking")]
return {"role": role, "content": content}
def _serialize_action(content_blocks: list[dict]) -> str:
"""Canonicalize the student's action at a step.
For tool_use steps: JSON-encode the (name, input) pairs.
For text-only steps: return the concatenated text.
"""
tool_uses = [b for b in content_blocks if isinstance(b, dict) and b.get("type") == "tool_use"]
if tool_uses:
return json.dumps(
[{"name": tu.get("name"), "input": tu.get("input")} for tu in tool_uses],
sort_keys=True,
)
texts = [b.get("text", "") for b in content_blocks if isinstance(b, dict) and b.get("type") == "text"]
return "\n".join(t for t in texts if t)
class TraceIngester:
"""Reads a Claude Code session JSONL and yields TraceState records.
One TraceState is emitted per assistant record. The `messages` field is the
full prior conversation (system + alternating user/assistant) up to but not
including the current assistant turn; `student_action` is the canonicalized
serialization of that turn's content blocks.
"""
def __init__(self, *, skip_thinking: bool = True, min_action_chars: int = 1) -> None:
self.skip_thinking = skip_thinking
self.min_action_chars = min_action_chars
def ingest(self, path: str | Path) -> Iterator[dict]: # yields TraceState
path = Path(path)
prior_messages: list[dict] = []
session_id_for_state = path.stem # filename = session UUID
with path.open("r", encoding="utf-8") as f:
for line_idx, line in enumerate(f):
line = line.strip()
if not line:
continue
try:
rec = json.loads(line)
except json.JSONDecodeError:
continue # tolerate truncated last-line writes
chat_msg = _record_to_chat_message(rec)
if chat_msg is None:
continue
if chat_msg["role"] == "assistant":
# Emit a TraceState representing "before this turn".
blocks = chat_msg["content"] if isinstance(chat_msg["content"], list) else []
student_action = _serialize_action(blocks)
if len(student_action) >= self.min_action_chars:
yield {
"state_id": f"{session_id_for_state}:{rec.get('uuid', line_idx)}",
"messages": list(prior_messages), # snapshot
"student_action": student_action,
}
# Append to history regardless (so subsequent turns see it).
prior_messages.append(chat_msg)
Notes:
- We skip
thinkingblocks because (1) they're Anthropic-specific and (2) feeding them to other-vendor teachers (GPT/DeepSeek) leaks reasoning the teacher should produce on its own. This matches the philosophy used in spike-005's_normalize_action. - We do NOT inject a system prompt — Claude Code's initial system prompt is not in the JSONL (it's set at SDK init and visible only via
attachmentrecords). Downstream callers may want to prepend a synthetic system message for teacher fairness. Open question for ADR-002. state_id = f"{sessionId}:{recordUuid}"is globally unique and stable across re-ingest.- Failures (unparseable lines, missing fields) are tolerated silently. A counters-based sibling method
ingest_with_stats(path)is a small follow-up.
6.1 Smoke-test plan (for Spike 007 itself)
ingester = TraceIngester()
states = list(ingester.ingest("/home/codeseys/.claude/projects/-mnt-e-CS-github-VIGOR--overstory-worktrees-builder-iteration-checkpoint/e4a34e2b-40c6-49ce-b253-912a43224aae.jsonl"))
# Expect roughly 197 states (matches asst-message count counted in §4).
# Then teacher-replay on the first 5 states, confirm cost is in the
# spike-001 ballpark ($0.05–$0.20 for 5 states × 3 teachers).
Spike 001 baseline to beat: $0.98/trace mean (50-state synthetic), $0.30/trace projected with VOI gating. On real states a ~5–20× cost increase is plausible due to longer message histories (10k+ tokens vs synthetic ~300 tokens), so a relevant economic check for Spike 007 is: if the first 5 states cost > $5 (i.e. > $1/state), the VOI gate from Spike 001 is required before scaling. Flag this finding in the spike write-up.
7. Open questions for ADR-002
- Do we promote
TraceStateto a top-levelTraceExampledataclass, with optionalteacher_id,reward,hint_text? Or keepTraceStateas ingester output andDPOPairas trainer input, treating the brief's "TraceExample" as conceptual? - Should
TraceIngester.ingest()emit one record per assistant turn (current sketch) or per assistanttool_useblock within a turn? Some Claude Code records have multiple tool_use blocks in one assistant message. - Synthetic system prompt at replay time — yes/no? If yes, what content?
- Trace-version pinning: hard-fail or warn when
versionfield falls outside a known-tested range? - Subagent transcripts (
agent-*.jsonl) — include or skip? They are denser per-turn but their parent context is the orchestrator, not the user, which changes the teacher-replay semantics.
8. References (primary sources only)
Anthropic / Claude Code official:
- https://code.claude.com/docs/en/sessions — session storage location and "JSONL, one JSON per line"
- https://code.claude.com/docs/en/data-usage — "local caching … session transcripts locally in plaintext under
~/.claude/projects/for 30 days by default" - https://code.claude.com/docs/en/legal-and-compliance — Commercial Terms vs Consumer Terms applicability
- https://github.com/anthropics/claude-code/blob/1e95326e12183286fc6cbd828c8a86a0d8e03c62/LICENSE.md — proprietary license
Community schemas (reverse-engineered from real session data):
- https://github.com/moru-ai/agent-schemas/blob/main/claude-code/v2.1.1/session.schema.json — JSON Schema Draft 2020-12, validated against ~50,000 messages from 480 sessions
- https://github.com/KyleAMathews/claude-code-ui/blob/main/spec.md §"Claude Code Session Log Format" — Entry types and TypeScript discriminated union
- https://github.com/jamie-bitflight/claude_skills/blob/main/plugins/agentskill-kaizen/skills/transcript-analysis/references/session-log-schema.md — top-level fields, project-key encoding, subagent file location
- https://github.com/dagster-io/erk/blob/master/docs/learned/sessions/layout.md — directory structure, plan-mode
slugfield - https://github.com/pedropaulovc/claude-code-types — TypeScript type definitions from session logs
Runners-up reference points:
- OpenHands events: https://docs.openhands.dev/sdk/arch/events, https://docs.openhands.dev/sdk/guides/convo-persistence, https://github.com/OpenHands/OpenHands/blob/3ec999e8/openhands/events/serialization/event.py, https://github.com/All-Hands-AI/OpenHands/issues/8701
- SWE-bench experiments: https://github.com/swe-bench/experiments
- SWE-smith trajectories on HF: https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories
- mini-swe-agent traj.json: https://huggingface.co/datasets/tarsur385/swebench-verified-trajectories
- swe-traj-complete (SWE-agent format example): https://huggingface.co/datasets/JetBrains-Research/swe-traj-complete
- Aider history file format: https://aider.chat/docs/config/dotenv.html, https://github.com/Aider-AI/aider/blob/bdb4d9ff/aider/history.py, https://github.com/paul-gauthier/aider/blob/main/aider/io.py
Internal references:
spikes/005-integrated-trainer-skeleton/teacher_replay.py—TraceState,DPOPair,replay_trace,extract_dpo_pairs(read in full during this audit; see §1 for actual field list)- Spike 001 economic floor: $0.98/trace mean ungated, $0.30/trace projected with VOI gating