update
Browse files
plugins/ml-intern/.codex-plugin/plugin.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
{
|
| 2 |
"name": "ml-intern",
|
| 3 |
-
"version": "0.1.
|
| 4 |
"description": "Hugging Face ML Intern for Codex — research ML papers first, inspect models and datasets, run training and evaluation jobs, and ship ML artifacts.",
|
| 5 |
"author": {
|
| 6 |
"name": "Hugging Face",
|
|
|
|
| 1 |
{
|
| 2 |
"name": "ml-intern",
|
| 3 |
+
"version": "0.1.5",
|
| 4 |
"description": "Hugging Face ML Intern for Codex — research ML papers first, inspect models and datasets, run training and evaluation jobs, and ship ML artifacts.",
|
| 5 |
"author": {
|
| 6 |
"name": "Hugging Face",
|
plugins/ml-intern/agents/openai.yaml
CHANGED
|
@@ -4,26 +4,28 @@ interface:
|
|
| 4 |
default_prompt: >
|
| 5 |
You are an ML engineering intern for the Hugging Face ecosystem.
|
| 6 |
ON EVERY TURN, BEFORE taking any action:
|
|
|
|
| 7 |
1. Check if the current conversation is under ml-intern-harness mode. If it was ever triggered in this session, it stays active.
|
| 8 |
-
2. If active,
|
| 9 |
3. If the user's message is ML-related (training, fine-tuning, dataset, model, benchmark, RAG, embedding, diffusion, LoRA, DPO, GRPO, SFT, TRL, transformers, trackio, Hugging Face, HF, evaluate, inspect, plan, architecture, design, research), STAY in harness mode.
|
| 10 |
4. If the user says vague follow-ups like "go ahead", "do it", "now what", "continue", "next step", "proceed", infer the next harness phase from the plan and execute it WITHOUT asking for clarification.
|
| 11 |
-
5. Call update_plan
|
| 12 |
6. Use hf-paper-search for novel or research-backed tasks.
|
| 13 |
7. Validate datasets with hf-dataset-search before training.
|
| 14 |
8. Read current HF docs with hf-docs before writing code.
|
| 15 |
9. Find GitHub examples with github-example-search before implementing.
|
| 16 |
10. Submit jobs with hf-jobs, never without preflight.
|
| 17 |
11. After each turn, check if the next step maps to the ml-intern-harness workflow. If yes, re-invoke it. Do NOT act as a generic assistant on ML tasks.
|
| 18 |
-
12. If the user explicitly says "stop using ml-intern" or the task is clearly non-ML (e.g., "what's the weather"), exit harness mode.
|
| 19 |
-
|
| 20 |
Research-first workflow:
|
| 21 |
- Clarify the deliverable in one sentence.
|
| 22 |
-
-
|
|
|
|
| 23 |
- Validate datasets and models before implementation.
|
| 24 |
- Implement smallest working version only after research.
|
| 25 |
- Smoke test before full runs.
|
| 26 |
- Evaluate and ship artifacts.
|
| 27 |
-
- If the user only wants a plan, stop after the full research floor and return the plan with evidence
|
| 28 |
-
|
| 29 |
CRITICAL: The harness must drive the workflow across multiple turns. Do not drop to generic Codex behavior after the first response. The harness is session-persistent.
|
|
|
|
| 4 |
default_prompt: >
|
| 5 |
You are an ML engineering intern for the Hugging Face ecosystem.
|
| 6 |
ON EVERY TURN, BEFORE taking any action:
|
| 7 |
+
0. Call harness-state get_state before any other action. Use the returned phase as your starting point, not conversation history alone.
|
| 8 |
1. Check if the current conversation is under ml-intern-harness mode. If it was ever triggered in this session, it stays active.
|
| 9 |
+
2. If active, restate which harness phase you are in before proceeding (e.g., "Harness active — Phase 2: Research papers and datasets").
|
| 10 |
3. If the user's message is ML-related (training, fine-tuning, dataset, model, benchmark, RAG, embedding, diffusion, LoRA, DPO, GRPO, SFT, TRL, transformers, trackio, Hugging Face, HF, evaluate, inspect, plan, architecture, design, research), STAY in harness mode.
|
| 11 |
4. If the user says vague follow-ups like "go ahead", "do it", "now what", "continue", "next step", "proceed", infer the next harness phase from the plan and execute it WITHOUT asking for clarification.
|
| 12 |
+
5. Call update_plan at the START of the session and at EVERY phase transition. Keep exactly one item in_progress at all times. Do not advance phases without updating the plan first.
|
| 13 |
6. Use hf-paper-search for novel or research-backed tasks.
|
| 14 |
7. Validate datasets with hf-dataset-search before training.
|
| 15 |
8. Read current HF docs with hf-docs before writing code.
|
| 16 |
9. Find GitHub examples with github-example-search before implementing.
|
| 17 |
10. Submit jobs with hf-jobs, never without preflight.
|
| 18 |
11. After each turn, check if the next step maps to the ml-intern-harness workflow. If yes, re-invoke it. Do NOT act as a generic assistant on ML tasks.
|
| 19 |
+
12. If the user explicitly says "stop using ml-intern" or the task is clearly non-ML (e.g., "what's the weather"), call harness-state set_state with active: false and exit harness mode.
|
| 20 |
+
|
| 21 |
Research-first workflow:
|
| 22 |
- Clarify the deliverable in one sentence.
|
| 23 |
+
- Research floor (minimum): papers → datasets (inspect at least one candidate) → code examples (read at least one working file) → HF docs for any API you'll call → external constraints. Do not skip layers.
|
| 24 |
+
- For plan-only outputs, prefix the plan with a compact evidence table: Source / Artifact | Verified finding | Design implication | Confidence. Do not return prose summaries as the primary evidence format.
|
| 25 |
- Validate datasets and models before implementation.
|
| 26 |
- Implement smallest working version only after research.
|
| 27 |
- Smoke test before full runs.
|
| 28 |
- Evaluate and ship artifacts.
|
| 29 |
+
- If the user only wants a plan, stop after the full research floor and return the plan with evidence table. Do not implement.
|
| 30 |
+
|
| 31 |
CRITICAL: The harness must drive the workflow across multiple turns. Do not drop to generic Codex behavior after the first response. The harness is session-persistent.
|
plugins/ml-intern/skills/harness-state/SKILL.md
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: harness-state
|
| 3 |
+
description: "Read and write the ml-intern harness state (active flag, current phase number, phase name). Call get_state at the start of every harness turn. Call set_state after every phase transition."
|
| 4 |
+
disable-model-invocation: false
|
| 5 |
+
***
|
| 6 |
+
|
| 7 |
+
# harness-state
|
| 8 |
+
|
| 9 |
+
## Purpose
|
| 10 |
+
|
| 11 |
+
Persist and retrieve the ml-intern harness mode flag and current workflow phase across turns.
|
| 12 |
+
Codex does not natively carry arbitrary session state between model calls — this skill fills that gap by writing state to a local JSON file in the `.codex-plugin/` store.
|
| 13 |
+
|
| 14 |
+
## When To Call This Skill
|
| 15 |
+
|
| 16 |
+
| Moment | Action |
|
| 17 |
+
|---|---|
|
| 18 |
+
| First turn in a new session | `get_state` — establish baseline |
|
| 19 |
+
| Every harness turn (before responding) | `get_state` — confirm active + phase |
|
| 20 |
+
| Harness first triggered | `set_state` with `active: true`, `phase: 1`, `phase_name: "Clarify"` |
|
| 21 |
+
| After completing a phase and moving to the next | `set_state` with updated phase number and name |
|
| 22 |
+
| User says "stop using ml-intern" | `set_state` with `active: false` |
|
| 23 |
+
|
| 24 |
+
## Operations
|
| 25 |
+
|
| 26 |
+
### get_state
|
| 27 |
+
|
| 28 |
+
Returns the current harness state. If no state file exists yet, returns the default (inactive, phase 0).
|
| 29 |
+
|
| 30 |
+
```json
|
| 31 |
+
{ "operation": "get_state" }
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
Response shape:
|
| 35 |
+
```json
|
| 36 |
+
{
|
| 37 |
+
"active": true,
|
| 38 |
+
"phase": 2,
|
| 39 |
+
"phase_name": "Research papers and datasets"
|
| 40 |
+
}
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
### set_state
|
| 44 |
+
|
| 45 |
+
Writes new harness state. All fields are required.
|
| 46 |
+
|
| 47 |
+
```json
|
| 48 |
+
{
|
| 49 |
+
"operation": "set_state",
|
| 50 |
+
"active": true,
|
| 51 |
+
"phase": 3,
|
| 52 |
+
"phase_name": "Read HF docs and code examples"
|
| 53 |
+
}
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Phase Reference
|
| 57 |
+
|
| 58 |
+
Use these canonical phase names when calling `set_state`:
|
| 59 |
+
|
| 60 |
+
| Phase | Name |
|
| 61 |
+
|---|---|
|
| 62 |
+
| 1 | Clarify |
|
| 63 |
+
| 2 | Research papers and datasets |
|
| 64 |
+
| 3 | Read HF docs and code examples |
|
| 65 |
+
| 4 | Implement |
|
| 66 |
+
| 5 | Smoke test |
|
| 67 |
+
| 6 | Run full job |
|
| 68 |
+
| 7 | Evaluate |
|
| 69 |
+
| 8 | Ship |
|
| 70 |
+
|
| 71 |
+
For tasks that skip phases (e.g. plan-only requests that stop at phase 2), still set the phase to wherever you actually are. Do not skip `set_state` calls — they are the only durable record of phase across turns.
|
| 72 |
+
|
| 73 |
+
## Rules
|
| 74 |
+
|
| 75 |
+
- Always call `get_state` before the first substantive action on any harness turn.
|
| 76 |
+
- Always call `set_state` immediately after transitioning to a new phase — before doing work in the new phase.
|
| 77 |
+
- Never infer phase from conversation history if a state file exists — the file is the source of truth.
|
| 78 |
+
- If `get_state` returns `active: false` but the current message is ML-related, set state to active before proceeding.
|
plugins/ml-intern/skills/harness-state/scripts/state.py
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
harness-state skill script.
|
| 4 |
+
Operations: get_state, set_state
|
| 5 |
+
State is stored in .codex-plugin/harness_state.json relative to the repo root.
|
| 6 |
+
Falls back to the current working directory if the .codex-plugin dir is not found.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
import json
|
| 10 |
+
import os
|
| 11 |
+
import sys
|
| 12 |
+
from pathlib import Path
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
STATE_FILENAME = "harness_state.json"
|
| 16 |
+
|
| 17 |
+
DEFAULT_STATE = {
|
| 18 |
+
"active": False,
|
| 19 |
+
"phase": 0,
|
| 20 |
+
"phase_name": "",
|
| 21 |
+
}
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def find_state_dir() -> Path:
|
| 25 |
+
"""Walk up from cwd looking for .codex-plugin/. Fall back to cwd."""
|
| 26 |
+
cwd = Path.cwd()
|
| 27 |
+
for parent in [cwd, *cwd.parents]:
|
| 28 |
+
candidate = parent / ".codex-plugin"
|
| 29 |
+
if candidate.is_dir():
|
| 30 |
+
return candidate
|
| 31 |
+
# Fallback: use cwd itself
|
| 32 |
+
return cwd
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def state_path() -> Path:
|
| 36 |
+
return find_state_dir() / STATE_FILENAME
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def read_state() -> dict:
|
| 40 |
+
path = state_path()
|
| 41 |
+
if not path.exists():
|
| 42 |
+
return dict(DEFAULT_STATE)
|
| 43 |
+
try:
|
| 44 |
+
with open(path) as f:
|
| 45 |
+
data = json.load(f)
|
| 46 |
+
# Fill missing keys with defaults
|
| 47 |
+
return {**DEFAULT_STATE, **data}
|
| 48 |
+
except (json.JSONDecodeError, OSError):
|
| 49 |
+
return dict(DEFAULT_STATE)
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
def write_state(active: bool, phase: int, phase_name: str) -> dict:
|
| 53 |
+
state = {"active": active, "phase": phase, "phase_name": phase_name}
|
| 54 |
+
path = state_path()
|
| 55 |
+
path.parent.mkdir(parents=True, exist_ok=True)
|
| 56 |
+
with open(path, "w") as f:
|
| 57 |
+
json.dump(state, f, indent=2)
|
| 58 |
+
return state
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def main():
|
| 62 |
+
if len(sys.argv) < 2:
|
| 63 |
+
print(json.dumps({"error": "Usage: state.py <json_input>"}))
|
| 64 |
+
sys.exit(1)
|
| 65 |
+
|
| 66 |
+
try:
|
| 67 |
+
args = json.loads(sys.argv[1])
|
| 68 |
+
except json.JSONDecodeError as e:
|
| 69 |
+
print(json.dumps({"error": f"Invalid JSON input: {e}"}))
|
| 70 |
+
sys.exit(1)
|
| 71 |
+
|
| 72 |
+
operation = args.get("operation")
|
| 73 |
+
|
| 74 |
+
if operation == "get_state":
|
| 75 |
+
result = read_state()
|
| 76 |
+
print(json.dumps(result))
|
| 77 |
+
|
| 78 |
+
elif operation == "set_state":
|
| 79 |
+
active = args.get("active")
|
| 80 |
+
phase = args.get("phase")
|
| 81 |
+
phase_name = args.get("phase_name", "")
|
| 82 |
+
|
| 83 |
+
if active is None or phase is None:
|
| 84 |
+
print(json.dumps({"error": "set_state requires 'active' (bool) and 'phase' (int)"}))
|
| 85 |
+
sys.exit(1)
|
| 86 |
+
|
| 87 |
+
if not isinstance(active, bool):
|
| 88 |
+
print(json.dumps({"error": "'active' must be a boolean"}))
|
| 89 |
+
sys.exit(1)
|
| 90 |
+
|
| 91 |
+
if not isinstance(phase, int):
|
| 92 |
+
print(json.dumps({"error": "'phase' must be an integer"}))
|
| 93 |
+
sys.exit(1)
|
| 94 |
+
|
| 95 |
+
result = write_state(active=active, phase=phase, phase_name=str(phase_name))
|
| 96 |
+
print(json.dumps({"ok": True, "state": result}))
|
| 97 |
+
|
| 98 |
+
else:
|
| 99 |
+
print(json.dumps({"error": f"Unknown operation '{operation}'. Valid: get_state, set_state"}))
|
| 100 |
+
sys.exit(1)
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
if __name__ == "__main__":
|
| 104 |
+
main()
|
plugins/ml-intern/skills/ml-intern-harness/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
name: ml-intern-harness
|
| 3 |
description: "The core ML Intern skill. Use for any ML engineering task on the Hugging Face ecosystem: research, validate, implement, test, run jobs, evaluate, and ship artifacts. Triggers for fine-tuning, training, evaluation, dataset preparation, model cards, and paper-to-implementation tasks."
|
| 4 |
disable-model-invocation: false
|
| 5 |
-
|
| 6 |
|
| 7 |
# ML Intern Harness
|
| 8 |
|
|
@@ -94,6 +94,32 @@ Preferred shape:
|
|
| 94 |
|
| 95 |
When the user only wants a plan, the final `update_plan` call should still mark the synthesis step completed before returning.
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
## High-Risk Mistakes To Avoid
|
| 98 |
|
| 99 |
- Hallucinated imports or trainer arguments from outdated memory.
|
|
@@ -140,11 +166,12 @@ Minimum research floor:
|
|
| 140 |
- **Docs**: Read current HF docs for any library/API that the plan depends on.
|
| 141 |
- **External constraints**: Use current web/official docs for non-HF platform constraints, policies, rate limits, pricing, or APIs.
|
| 142 |
|
| 143 |
-
For plan-only outputs, return a compact evidence table before the plan
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
-
|
| 147 |
-
|
|
|
|
| 148 |
|
| 149 |
If runtime policy prevents spawning a research sub-agent, note that only as a process limitation; do not use it as a reason to skip dataset, code, docs, or citation-graph research.
|
| 150 |
|
|
@@ -168,6 +195,7 @@ When delegation is not allowed:
|
|
| 168 |
- Perform the same probes directly in the main context.
|
| 169 |
- State the limitation briefly as a process note only.
|
| 170 |
- Still preserve the upstream research order: papers first, then datasets, then docs/examples, then current external constraints.
|
|
|
|
| 171 |
|
| 172 |
Research prompt pattern to emulate:
|
| 173 |
- Start from anchor papers or landmark work.
|
|
@@ -245,15 +273,14 @@ Use the `hf-jobs` skill for job submission and monitoring.
|
|
| 245 |
When something fails:
|
| 246 |
- Read the full error and relevant logs.
|
| 247 |
- Do not retry the exact same command without changing the cause.
|
| 248 |
-
- Import error: fetch docs
|
| 249 |
-
- Dataset KeyError: re-inspect schema, patch preprocessing.
|
| 250 |
-
- OOM: reduce
|
| 251 |
-
- Divergence/NaN: lower learning rate, check labels
|
| 252 |
-
- Weak metric: compare against paper
|
|
|
|
| 253 |
- If the issue is ambiguous, return to the most authoritative source available before making a speculative change.
|
| 254 |
|
| 255 |
-
Do not hide compromises. If preserving the original request is impossible, explain the constraint and ask for approval.
|
| 256 |
-
|
| 257 |
## Completion Standard
|
| 258 |
|
| 259 |
Before final response, verify:
|
|
@@ -265,4 +292,4 @@ Return:
|
|
| 265 |
- Source repo links (branch, commit, PR).
|
| 266 |
- Hugging Face artifact URLs (model, dataset, Space, job).
|
| 267 |
- Metrics or evaluation results.
|
| 268 |
-
- Known gaps, failures, or next experiments.
|
|
|
|
| 2 |
name: ml-intern-harness
|
| 3 |
description: "The core ML Intern skill. Use for any ML engineering task on the Hugging Face ecosystem: research, validate, implement, test, run jobs, evaluate, and ship artifacts. Triggers for fine-tuning, training, evaluation, dataset preparation, model cards, and paper-to-implementation tasks."
|
| 4 |
disable-model-invocation: false
|
| 5 |
+
***
|
| 6 |
|
| 7 |
# ML Intern Harness
|
| 8 |
|
|
|
|
| 94 |
|
| 95 |
When the user only wants a plan, the final `update_plan` call should still mark the synthesis step completed before returning.
|
| 96 |
|
| 97 |
+
### Example plan shape
|
| 98 |
+
|
| 99 |
+
The following shows the exact structure to use when calling `update_plan`. IDs are stable integers assigned at plan creation and never reused. Exactly one item is `in_progress` at any time. The entire list is replaced on every call — never partial updates. Only mark an item `completed` after it fully succeeds.
|
| 100 |
+
|
| 101 |
+
```
|
| 102 |
+
update_plan:
|
| 103 |
+
todos:
|
| 104 |
+
- id: 1
|
| 105 |
+
content: "Research papers"
|
| 106 |
+
status: completed
|
| 107 |
+
- id: 2
|
| 108 |
+
content: "Inspect datasets"
|
| 109 |
+
status: in_progress
|
| 110 |
+
- id: 3
|
| 111 |
+
content: "Read HF docs and code examples"
|
| 112 |
+
status: pending
|
| 113 |
+
- id: 4
|
| 114 |
+
content: "Implement training script"
|
| 115 |
+
status: pending
|
| 116 |
+
- id: 5
|
| 117 |
+
content: "Smoke test and submit job"
|
| 118 |
+
status: pending
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
Do not use freeform status strings such as "done", "wip", or "not started". Only `pending`, `in_progress`, and `completed` are valid.
|
| 122 |
+
|
| 123 |
## High-Risk Mistakes To Avoid
|
| 124 |
|
| 125 |
- Hallucinated imports or trainer arguments from outdated memory.
|
|
|
|
| 166 |
- **Docs**: Read current HF docs for any library/API that the plan depends on.
|
| 167 |
- **External constraints**: Use current web/official docs for non-HF platform constraints, policies, rate limits, pricing, or APIs.
|
| 168 |
|
| 169 |
+
For plan-only outputs, return a compact evidence table before the plan:
|
| 170 |
+
|
| 171 |
+
| Source / Artifact | What was verified | Design implication | Confidence |
|
| 172 |
+
|---|---|---|---|
|
| 173 |
+
|
| 174 |
+
Use `verified`, `inferred`, or `not checked` in the Confidence column. Do not return prose summaries as the primary evidence format — the table is the required handoff format.
|
| 175 |
|
| 176 |
If runtime policy prevents spawning a research sub-agent, note that only as a process limitation; do not use it as a reason to skip dataset, code, docs, or citation-graph research.
|
| 177 |
|
|
|
|
| 195 |
- Perform the same probes directly in the main context.
|
| 196 |
- State the limitation briefly as a process note only.
|
| 197 |
- Still preserve the upstream research order: papers first, then datasets, then docs/examples, then current external constraints.
|
| 198 |
+
- Return findings as a compact evidence table (Source / Artifact | Verified finding | Design implication | Confidence) before the plan. Do not return prose summaries as the primary evidence format.
|
| 199 |
|
| 200 |
Research prompt pattern to emulate:
|
| 201 |
- Start from anchor papers or landmark work.
|
|
|
|
| 273 |
When something fails:
|
| 274 |
- Read the full error and relevant logs.
|
| 275 |
- Do not retry the exact same command without changing the cause.
|
| 276 |
+
- **Import error**: fetch the current docs or example file, patch the import or config name. Do not guess from memory.
|
| 277 |
+
- **Dataset KeyError**: re-inspect the schema, patch preprocessing to match actual column names.
|
| 278 |
+
- **OOM**: reduce `per_device_train_batch_size`, increase `gradient_accumulation_steps` proportionally to preserve effective batch size, set `gradient_checkpointing=True`. Do NOT switch SFT to LoRA, reduce `max_length`, or change the training method without explicit user approval.
|
| 279 |
+
- **Divergence / NaN**: lower the learning rate, check labels and rewards for correctness, inspect representative samples. Do not silently substitute a different optimizer or scheduler.
|
| 280 |
+
- **Weak metric**: compare against the paper recipe step by step, inspect error cases, propose a targeted sweep. Do not silently change datasets, models, or methods.
|
| 281 |
+
- **Silent substitution is never allowed**: if preserving the original request is impossible, explain the constraint and ask for approval before making any scope change.
|
| 282 |
- If the issue is ambiguous, return to the most authoritative source available before making a speculative change.
|
| 283 |
|
|
|
|
|
|
|
| 284 |
## Completion Standard
|
| 285 |
|
| 286 |
Before final response, verify:
|
|
|
|
| 292 |
- Source repo links (branch, commit, PR).
|
| 293 |
- Hugging Face artifact URLs (model, dataset, Space, job).
|
| 294 |
- Metrics or evaluation results.
|
| 295 |
+
- Known gaps, failures, or next experiments.
|