msg encrypted ai MSGEncrypted commited on
Commit
aac5f23
·
1 Parent(s): 1e52a1f

Feat/sprint last 2hours (#22)

Browse files

* common fix wip

* server app wip fix

* language stuff

* readme

* server app fix

* eval language

* wip language app lesson

* wip fix

* test modal

* test modal and fix

* voice fallback

* readme wip

---------

Co-authored-by: MSGhais <msghais135@gmail.com>

.env.example CHANGED
@@ -64,14 +64,16 @@ ACTIVE_MODEL=minicpm5-1b
64
 
65
  # --- EchoCoach / Language lessons (voice stack) ---
66
  # VOICE_PRESETS_PATH=./voice_models.yaml
67
- # Recommended for Cohere Labs partner demo (GPU Space):
 
 
 
 
68
  # ECHOCOACH_ASR_PRESET=cohere-transcribe
69
  # ECHOCOACH_COACH_MODEL=tiny-aya-global
70
- # Comma-separated preset keys from models.yaml if primary coach fails to load:
71
- # ECHOCOACH_COACH_FALLBACK=minicpm5-1b
72
  # ECHOCOACH_TTS_PRESET=piper-multilingual
73
  # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
74
- # Dev fallback (CPU):
75
  # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
76
  # ECHOCOACH_COACH_MODEL=minicpm5-1b
77
  # ECHOCOACH_MAX_SECONDS=30
@@ -79,7 +81,5 @@ ACTIVE_MODEL=minicpm5-1b
79
  # ECHOCOACH_VOICE_PROFILE=pipeline # pipeline (default) or omni for MiniCPM-o attempt
80
  # ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
81
  # PIPER_VOICES_DIR=~/.local/share/piper/voices
82
- # For Cohere Transcribe ASR: huggingface-cli login + accept model terms, then:
83
- # ECHOCOACH_ASR_PRESET=cohere-transcribe
84
 
85
  BASE=openbmb/MiniCPM5-1B
 
64
 
65
  # --- EchoCoach / Language lessons (voice stack) ---
66
  # VOICE_PRESETS_PATH=./voice_models.yaml
67
+ # Default (Cohere-free): Whisper ASR + OpenBMB language-lesson LoRA coach
68
+ # ECHOCOACH_ASR_PRESET=whisper-cpp-base
69
+ # ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
70
+ # ECHOCOACH_COACH_FALLBACK=minicpm5-1b-language-lesson-lora,minicpm5-1b
71
+ # Optional Cohere Labs partner demo (GPU Space + HF gated models):
72
  # ECHOCOACH_ASR_PRESET=cohere-transcribe
73
  # ECHOCOACH_COACH_MODEL=tiny-aya-global
 
 
74
  # ECHOCOACH_TTS_PRESET=piper-multilingual
75
  # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
76
+ # Dev fallback (CPU, no LoRA):
77
  # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
78
  # ECHOCOACH_COACH_MODEL=minicpm5-1b
79
  # ECHOCOACH_MAX_SECONDS=30
 
81
  # ECHOCOACH_VOICE_PROFILE=pipeline # pipeline (default) or omni for MiniCPM-o attempt
82
  # ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
83
  # PIPER_VOICES_DIR=~/.local/share/piper/voices
 
 
84
 
85
  BASE=openbmb/MiniCPM5-1B
README.md CHANGED
@@ -31,8 +31,13 @@ See **[USAGE.md](USAGE.md)** for local run, Gradio SDK / ZeroGPU Space deploymen
31
 
32
  **Demo video:** [https://www.youtube.com/watch?v=bwtOiZvJ-7k](https://www.youtube.com/watch?v=bwtOiZvJ-7k)
33
 
 
 
34
  **X post:** [https://x.com/MSG_Encrypted/status/2066570320861921748](https://x.com/MSG_Encrypted/status/2066570320861921748)
35
 
 
 
 
36
  ## Prerequisites
37
 
38
  - [uv](https://docs.astral.sh/uv/)
@@ -175,6 +180,7 @@ A root `Dockerfile` is kept for a later **Docker SDK** deploy (flip README to `s
175
 
176
  - Space live under build-small-hackathon
177
  - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
 
178
  - Social post published: [X](https://x.com/MSG_Encrypted/status/2066570320861921748)
179
  - Submission by **June 15, 2026**
180
 
 
31
 
32
  **Demo video:** [https://www.youtube.com/watch?v=bwtOiZvJ-7k](https://www.youtube.com/watch?v=bwtOiZvJ-7k)
33
 
34
+ **Blog post:** [Small Models, Bounded Jobs](https://huggingface.co/blog/build-small-hackathon/lessonagent-opennotebook) — Hugging Face Build Small Hackathon write-up
35
+
36
  **X post:** [https://x.com/MSG_Encrypted/status/2066570320861921748](https://x.com/MSG_Encrypted/status/2066570320861921748)
37
 
38
+
39
+ **Github:** [https://github.com/MSghais/small-model-hackathon/](https://github.com/MSghais/small-model-hackathon/)
40
+
41
  ## Prerequisites
42
 
43
  - [uv](https://docs.astral.sh/uv/)
 
180
 
181
  - Space live under build-small-hackathon
182
  - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
183
+ - Blog post: [Small Models, Bounded Jobs](https://huggingface.co/blog/build-small-hackathon/lessonagent-opennotebook)
184
  - Social post published: [X](https://x.com/MSG_Encrypted/status/2066570320861921748)
185
  - Submission by **June 15, 2026**
186
 
USAGE.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
4
 
5
- The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (Cohere Transcribe + Tiny Aya), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
6
 
7
  ## Prerequisites
8
 
@@ -146,10 +146,10 @@ Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
146
 
147
  | Variable | Default | Description |
148
  | -------- | ------- | ----------- |
149
- | `ECHOCOACH_ASR_PRESET` | `cohere-transcribe` | ASR preset key (Space demo); use `whisper-cpp-tiny` on CPU dev |
150
  | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
151
  | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
152
- | `ECHOCOACH_COACH_MODEL` | `tiny-aya-global` | Text coach preset (Tiny Aya; from `models.yaml`) |
153
  | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
154
  | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
155
 
@@ -169,15 +169,25 @@ The **Language lessons** tab is the primary voice learning experience: one page
169
  | ----- | ------ |
170
  | Type a question | Chat bubble in target language |
171
  | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
172
- | **Other (text only)** language code | Tiny Aya written lesson (no Piper voice for unsupported codes) |
173
 
174
- **Stack (Cohere Labs partner demo):** [Cohere Transcribe](https://huggingface.co/CohereLabs/c4ai-transcribe-v2) (14 voice langs) → [Tiny Aya Global / regional](https://huggingface.co/CohereLabs/tiny-aya-global) (70+ text langs) → Piper or VibeVoice Realtime for speech out.
175
 
176
- Set Space secrets (GPU recommended):
177
 
178
  ```bash
179
- ECHOCOACH_ASR_PRESET=cohere-transcribe
180
- ECHOCOACH_COACH_MODEL=tiny-aya-global
 
 
 
 
 
 
 
 
 
 
181
  ECHOCOACH_TTS_PRESET=piper-multilingual
182
  ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
183
  ```
 
2
 
3
  How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
4
 
5
+ The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
6
 
7
  ## Prerequisites
8
 
 
146
 
147
  | Variable | Default | Description |
148
  | -------- | ------- | ----------- |
149
+ | `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo |
150
  | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
151
  | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
152
+ | `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) |
153
  | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
154
  | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
155
 
 
169
  | ----- | ------ |
170
  | Type a question | Chat bubble in target language |
171
  | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
172
+ | **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |
173
 
174
+ **Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR → [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) → Piper or VibeVoice Realtime for speech out.
175
 
176
+ Rebuild training JSONL from Hugging Face sources:
177
 
178
  ```bash
179
+ uv run python research/data/build_language_lesson_chat.py
180
+ modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
181
+ ```
182
+
183
+ Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).
184
+
185
+ Default `.env` / Space secrets:
186
+
187
+ ```bash
188
+ ECHOCOACH_ASR_PRESET=whisper-cpp-base
189
+ ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
190
+ ECHOCOACH_COACH_FALLBACK=minicpm5-1b
191
  ECHOCOACH_TTS_PRESET=piper-multilingual
192
  ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
193
  ```
apps/gradio-space/src/gradio_space/api/studio.py CHANGED
@@ -10,7 +10,7 @@ import gradio as gr
10
 
11
  from echocoach.config import get_echo_coach_config
12
  from echocoach.pipeline import run_echo_coach
13
- from echocoach.prompts import TeacherVoiceMode, resolve_aya_preset
14
  from echocoach.recording import (
15
  ServerRecordingError,
16
  recording_backend_status,
@@ -187,7 +187,7 @@ def _coach_model_key(
187
  elif coach_variant and coach_variant not in ("auto", ""):
188
  key = coach_variant.strip()
189
  else:
190
- key = resolve_aya_preset(language, coach_variant)
191
  if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
192
  key = "tiny-aya-global"
193
  return key
 
10
 
11
  from echocoach.config import get_echo_coach_config
12
  from echocoach.pipeline import run_echo_coach
13
+ from echocoach.prompts import TeacherVoiceMode
14
  from echocoach.recording import (
15
  ServerRecordingError,
16
  recording_backend_status,
 
187
  elif coach_variant and coach_variant not in ("auto", ""):
188
  key = coach_variant.strip()
189
  else:
190
+ key = _echo_config.coach_model
191
  if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
192
  key = "tiny-aya-global"
193
  return key
models.yaml CHANGED
@@ -94,6 +94,20 @@ models:
94
  adapter_path: MSGEncrypted/minicpm5-1b-math-lora
95
  trust_remote_code: true
96
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  tiny-aya-global:
98
  label: Tiny Aya Global 3.3B (multilingual coach)
99
  backend: transformers
 
94
  adapter_path: MSGEncrypted/minicpm5-1b-math-lora
95
  trust_remote_code: true
96
 
97
+ minicpm5-1b-language-lesson-hub:
98
+ label: MiniCPM5 1B language lesson LoRA (FR/AR, Hub)
99
+ backend: transformers
100
+ model_id: openbmb/MiniCPM5-1B
101
+ adapter_path: MSGEncrypted/minicpm5-1b-language-lesson-lora
102
+ trust_remote_code: true
103
+
104
+ minicpm5-1b-language-lesson-lora:
105
+ label: MiniCPM5 1B language lesson LoRA (FR/AR, local)
106
+ backend: transformers
107
+ model_id: openbmb/MiniCPM5-1B
108
+ adapter_path: ./models/finetuned/language-lesson-lora
109
+ trust_remote_code: true
110
+
111
  tiny-aya-global:
112
  label: Tiny Aya Global 3.3B (multilingual coach)
113
  backend: transformers
research/data/build_language_lesson_chat.py ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Build TeacherVoice-shaped FR/AR chat JSONL from Hugging Face sources + seeds.
3
+
4
+ Exports:
5
+ research/data/language-lesson-fr.jsonl
6
+ research/data/language-lesson-ar.jsonl
7
+ research/data/language-lesson-eval-fr.jsonl (5% holdout)
8
+ research/data/language-lesson-eval-ar.jsonl
9
+
10
+ Usage:
11
+ uv run python research/data/build_language_lesson_chat.py
12
+ uv run python research/data/build_language_lesson_chat.py --max-per-source 500 --skip-hub
13
+ """
14
+
15
+ from __future__ import annotations
16
+
17
+ import argparse
18
+ import json
19
+ import random
20
+ import re
21
+ import sys
22
+ from collections.abc import Iterator
23
+ from pathlib import Path
24
+ from typing import Any, Literal
25
+
26
+ import yaml
27
+
28
+ _REPO_ROOT = Path(__file__).resolve().parents[2]
29
+ _DATA_DIR = Path(__file__).resolve().parent
30
+ if str(_REPO_ROOT) not in sys.path:
31
+ sys.path.insert(0, str(_REPO_ROOT))
32
+
33
+ from echocoach.prompts import ( # noqa: E402
34
+ system_prompt_for_mode,
35
+ topic_context_block,
36
+ )
37
+ from echocoach.teacher_voice import _VOICE_USER_SUFFIX # noqa: E402
38
+
39
+ VoiceMode = Literal["explain", "lesson"]
40
+
41
+ MIN_ASSISTANT_CHARS = 40
42
+ MAX_ASSISTANT_CHARS = 600
43
+ EVAL_HOLDOUT_RATIO = 0.05
44
+
45
+ DEFAULT_FR_SOURCES = (
46
+ "angeluriot/french_instruct",
47
+ "CohereLabs/aya_dataset",
48
+ "pinzhenchen/alpaca-cleaned-fr",
49
+ )
50
+ DEFAULT_AR_SOURCES = (
51
+ "arbml/CIDAR",
52
+ "ClusterlabAi/InstAr-500k",
53
+ "CohereLabs/aya_dataset",
54
+ )
55
+
56
+ SOURCE_CAPS: dict[str, dict[str, int]] = {
57
+ "angeluriot/french_instruct": {"fr": 8000},
58
+ "CohereLabs/aya_dataset": {"fr": 3000, "ar": 3000},
59
+ "pinzhenchen/alpaca-cleaned-fr": {"fr": 2000},
60
+ "arbml/CIDAR": {"ar": 8000},
61
+ "ClusterlabAi/InstAr-500k": {"ar": 5000},
62
+ }
63
+
64
+ _INSTAR_GOOD_TASKS = frozenset(
65
+ {
66
+ "Open QA",
67
+ "Extraction and Explanation",
68
+ "Summarization",
69
+ "Classification",
70
+ }
71
+ )
72
+
73
+ _CODE_MARKERS = re.compile(r"```|^\s*def |^\s*class |^\s*import ", re.MULTILINE)
74
+ _JSON_START = re.compile(r"^\s*[\{\[]")
75
+
76
+
77
+ def _assistant_ok(text: str) -> bool:
78
+ text = (text or "").strip()
79
+ if len(text) < MIN_ASSISTANT_CHARS or len(text) > MAX_ASSISTANT_CHARS:
80
+ return False
81
+ if _JSON_START.match(text):
82
+ return False
83
+ if _CODE_MARKERS.search(text):
84
+ return False
85
+ if text.count("\n") > 8:
86
+ return False
87
+ return True
88
+
89
+
90
+ def _pick_mode(rng: random.Random, *, topic: str | None) -> VoiceMode:
91
+ if topic and rng.random() < 0.4:
92
+ return "lesson"
93
+ return "explain" if rng.random() < 0.6 else "lesson"
94
+
95
+
96
+ def _wrap_row(
97
+ *,
98
+ language: str,
99
+ mode: VoiceMode,
100
+ user_text: str,
101
+ assistant_text: str,
102
+ topic: str | None = None,
103
+ ) -> dict[str, Any]:
104
+ system = system_prompt_for_mode(mode, language=language)
105
+ topic_line = topic_context_block(topic, mode)
106
+ if topic_line:
107
+ system = f"{system}\n\n{topic_line}"
108
+ user_body = f"{user_text.strip()}\n\n{_VOICE_USER_SUFFIX}"
109
+ return {
110
+ "messages": [
111
+ {"role": "system", "content": system},
112
+ {"role": "user", "content": user_body},
113
+ {"role": "assistant", "content": assistant_text.strip()},
114
+ ]
115
+ }
116
+
117
+
118
+ def _load_seeds(path: Path) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
119
+ if not path.is_file():
120
+ return [], []
121
+ raw = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
122
+ fr_rows: list[dict[str, Any]] = []
123
+ ar_rows: list[dict[str, Any]] = []
124
+ for lang, key in (("fr", "fr"), ("ar", "ar")):
125
+ for item in raw.get(key, []):
126
+ mode = item.get("mode", "explain")
127
+ topic = item.get("topic")
128
+ if topic in (None, "null", ""):
129
+ topic = None
130
+ row = _wrap_row(
131
+ language=lang,
132
+ mode=mode, # type: ignore[arg-type]
133
+ user_text=str(item["user"]),
134
+ assistant_text=str(item["assistant"]),
135
+ topic=str(topic) if topic else None,
136
+ )
137
+ (fr_rows if key == "fr" else ar_rows).append(row)
138
+ return fr_rows, ar_rows
139
+
140
+
141
+ def _iter_french_instruct(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
142
+ from datasets import load_dataset
143
+
144
+ ds = load_dataset("angeluriot/french_instruct", split="train", streaming=True)
145
+ count = 0
146
+ for row in ds:
147
+ messages = row.get("messages") or row.get("conversation")
148
+ if not messages:
149
+ continue
150
+ user_text = ""
151
+ assistant_text = ""
152
+ for msg in messages:
153
+ role = (msg.get("role") or msg.get("from") or "").lower()
154
+ content = (msg.get("content") or msg.get("value") or "").strip()
155
+ if role in ("user", "human"):
156
+ user_text = content
157
+ elif role in ("assistant", "gpt", "bot") and content:
158
+ assistant_text = content
159
+ if user_text and _assistant_ok(assistant_text):
160
+ yield user_text, assistant_text, None
161
+ count += 1
162
+ if count >= max_rows:
163
+ break
164
+
165
+
166
+ def _iter_aya(language_code: str, max_rows: int) -> Iterator[tuple[str, str, str | None]]:
167
+ from datasets import load_dataset
168
+
169
+ ds = load_dataset("CohereLabs/aya_dataset", split="train")
170
+ count = 0
171
+ for row in ds:
172
+ if row.get("language") != language_code:
173
+ continue
174
+ user_text = (row.get("inputs") or "").strip()
175
+ assistant_text = (row.get("targets") or "").strip()
176
+ if user_text and _assistant_ok(assistant_text):
177
+ yield user_text, assistant_text, None
178
+ count += 1
179
+ if count >= max_rows:
180
+ break
181
+
182
+
183
+ def _iter_alpaca_fr(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
184
+ from datasets import load_dataset
185
+
186
+ ds = load_dataset("pinzhenchen/alpaca-cleaned-fr", split="train")
187
+ count = 0
188
+ for row in ds:
189
+ instruction = (row.get("instruction") or "").strip()
190
+ inp = (row.get("input") or "").strip()
191
+ output = (row.get("output") or "").strip()
192
+ user_text = f"{instruction}\n{inp}".strip() if inp else instruction
193
+ if user_text and _assistant_ok(output):
194
+ yield user_text, output, None
195
+ count += 1
196
+ if count >= max_rows:
197
+ break
198
+
199
+
200
+ def _iter_cidar(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
201
+ from datasets import load_dataset
202
+
203
+ ds = load_dataset("arbml/CIDAR", split="train")
204
+ count = 0
205
+ for row in ds:
206
+ instruction = (row.get("instruction") or "").strip()
207
+ inp = (row.get("input") or "").strip()
208
+ output = (row.get("output") or "").strip()
209
+ user_text = f"{instruction}\n{inp}".strip() if inp else instruction
210
+ topic = instruction[:80] if instruction else None
211
+ if user_text and _assistant_ok(output):
212
+ yield user_text, output, topic
213
+ count += 1
214
+ if count >= max_rows:
215
+ break
216
+
217
+
218
+ def _iter_instar(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
219
+ from datasets import load_dataset
220
+
221
+ ds = load_dataset("ClusterlabAi/InstAr-500k", split="train", streaming=True)
222
+ count = 0
223
+ for row in ds:
224
+ task = row.get("task") or ""
225
+ if task not in _INSTAR_GOOD_TASKS:
226
+ continue
227
+ instruction = (row.get("instruction") or "").strip()
228
+ output = (row.get("output") or "").strip()
229
+ topic = (row.get("topic") or "").strip() or None
230
+ if instruction and _assistant_ok(output):
231
+ yield instruction, output, topic
232
+ count += 1
233
+ if count >= max_rows:
234
+ break
235
+
236
+
237
+ _SOURCE_LOADERS: dict[str, dict[str, Any]] = {
238
+ "angeluriot/french_instruct": {"fr": _iter_french_instruct},
239
+ "CohereLabs/aya_dataset": {
240
+ "fr": lambda n: _iter_aya("fra", n),
241
+ "ar": lambda n: _iter_aya("arb", n),
242
+ },
243
+ "pinzhenchen/alpaca-cleaned-fr": {"fr": _iter_alpaca_fr},
244
+ "arbml/CIDAR": {"ar": _iter_cidar},
245
+ "ClusterlabAi/InstAr-500k": {"ar": _iter_instar},
246
+ }
247
+
248
+
249
+ def _collect_from_source(
250
+ source: str,
251
+ language: str,
252
+ max_rows: int,
253
+ rng: random.Random,
254
+ ) -> list[dict[str, Any]]:
255
+ loaders = _SOURCE_LOADERS.get(source, {})
256
+ loader = loaders.get(language)
257
+ if loader is None:
258
+ print(f" skip {source} (no loader for {language})")
259
+ return []
260
+ rows: list[dict[str, Any]] = []
261
+ try:
262
+ for user_text, assistant_text, topic in loader(max_rows):
263
+ mode = _pick_mode(rng, topic=topic)
264
+ rows.append(
265
+ _wrap_row(
266
+ language=language,
267
+ mode=mode,
268
+ user_text=user_text,
269
+ assistant_text=assistant_text,
270
+ topic=topic,
271
+ )
272
+ )
273
+ except Exception as exc:
274
+ print(f" warning: {source} failed for {language}: {exc}")
275
+ return rows
276
+
277
+
278
+ def _split_eval(
279
+ rows: list[dict[str, Any]], rng: random.Random
280
+ ) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
281
+ if len(rows) < 20:
282
+ return rows, []
283
+ shuffled = rows.copy()
284
+ rng.shuffle(shuffled)
285
+ n_eval = max(1, int(len(shuffled) * EVAL_HOLDOUT_RATIO))
286
+ return shuffled[n_eval:], shuffled[:n_eval]
287
+
288
+
289
+ def _write_jsonl(path: Path, rows: list[dict[str, Any]]) -> None:
290
+ path.parent.mkdir(parents=True, exist_ok=True)
291
+ with path.open("w", encoding="utf-8") as fh:
292
+ for row in rows:
293
+ fh.write(json.dumps(row, ensure_ascii=False) + "\n")
294
+
295
+
296
+ def build_language_datasets(
297
+ *,
298
+ french_sources: tuple[str, ...],
299
+ arabic_sources: tuple[str, ...],
300
+ max_per_source: int,
301
+ seeds_path: Path,
302
+ skip_hub: bool,
303
+ seed: int,
304
+ ) -> None:
305
+ rng = random.Random(seed)
306
+ fr_rows, ar_rows = _load_seeds(seeds_path)
307
+ print(f"Loaded {len(fr_rows)} FR + {len(ar_rows)} AR seed rows from {seeds_path.name}")
308
+
309
+ if not skip_hub:
310
+ for source in french_sources:
311
+ cap = min(max_per_source, SOURCE_CAPS.get(source, {}).get("fr", max_per_source))
312
+ print(f"Fetching FR from {source} (cap={cap})...")
313
+ fr_rows.extend(_collect_from_source(source, "fr", cap, rng))
314
+ for source in arabic_sources:
315
+ cap = min(max_per_source, SOURCE_CAPS.get(source, {}).get("ar", max_per_source))
316
+ print(f"Fetching AR from {source} (cap={cap})...")
317
+ ar_rows.extend(_collect_from_source(source, "ar", cap, rng))
318
+
319
+ fr_train, fr_eval = _split_eval(fr_rows, rng)
320
+ ar_train, ar_eval = _split_eval(ar_rows, rng)
321
+
322
+ out_fr = _DATA_DIR / "language-lesson-fr.jsonl"
323
+ out_ar = _DATA_DIR / "language-lesson-ar.jsonl"
324
+ eval_fr = _DATA_DIR / "language-lesson-eval-fr.jsonl"
325
+ eval_ar = _DATA_DIR / "language-lesson-eval-ar.jsonl"
326
+
327
+ _write_jsonl(out_fr, fr_train)
328
+ _write_jsonl(out_ar, ar_train)
329
+ _write_jsonl(eval_fr, fr_eval)
330
+ _write_jsonl(eval_ar, ar_eval)
331
+
332
+ print(
333
+ f"Wrote FR train={len(fr_train)} eval={len(fr_eval)} -> {out_fr.name}, {eval_fr.name}\n"
334
+ f"Wrote AR train={len(ar_train)} eval={len(ar_eval)} -> {out_ar.name}, {eval_ar.name}"
335
+ )
336
+
337
+
338
+ def main() -> None:
339
+ parser = argparse.ArgumentParser(description=__doc__)
340
+ parser.add_argument(
341
+ "--french-sources",
342
+ default=",".join(DEFAULT_FR_SOURCES),
343
+ help="Comma-separated Hugging Face dataset ids for French",
344
+ )
345
+ parser.add_argument(
346
+ "--arabic-sources",
347
+ default=",".join(DEFAULT_AR_SOURCES),
348
+ help="Comma-separated Hugging Face dataset ids for Arabic",
349
+ )
350
+ parser.add_argument("--max-per-source", type=int, default=5000)
351
+ parser.add_argument(
352
+ "--custom-seeds",
353
+ type=Path,
354
+ default=_DATA_DIR / "language-lesson-seeds.yaml",
355
+ )
356
+ parser.add_argument(
357
+ "--skip-hub",
358
+ action="store_true",
359
+ help="Only write seed rows (offline / smoke)",
360
+ )
361
+ parser.add_argument("--seed", type=int, default=42)
362
+ args = parser.parse_args()
363
+
364
+ french_sources = tuple(s.strip() for s in args.french_sources.split(",") if s.strip())
365
+ arabic_sources = tuple(s.strip() for s in args.arabic_sources.split(",") if s.strip())
366
+
367
+ build_language_datasets(
368
+ french_sources=french_sources,
369
+ arabic_sources=arabic_sources,
370
+ max_per_source=args.max_per_source,
371
+ seeds_path=args.custom_seeds,
372
+ skip_hub=args.skip_hub,
373
+ seed=args.seed,
374
+ )
375
+
376
+
377
+ if __name__ == "__main__":
378
+ main()
research/data/language-lesson-ar.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
research/data/language-lesson-eval-ar.jsonl ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: أنشئ قائمة بالصفات التي تصف الشخص الشجاع."}, {"role": "user", "content": "أنشئ قائمة بالصفات التي تصف الشخص الشجاع.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "شجاع، بطل، جريء، قوي، شجاع، حازم، جريء، شجاع، جريء، عنيد، شجاع، قوي القلب، شجاع، شجاع، لا يخشى، شجاع، لا يرتعب، مفعم بالحيوية، مجازف."}]}
2
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: حل مشكلة خدمة العملاء مع عميل غير راضٍ. العميل غير راضٍ بسبب تأخر وصول طلبه."}, {"role": "user", "content": "حل مشكلة خدمة العملاء مع عميل غير راضٍ. العميل غير راضٍ بسبب تأخر وصول طلبه.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "أعتذر عن تأخر وصول طلبكم عن الموعد المتوقع. فريقنا يولي اهتمامًا كبيرًا لرضا العملاء، لذلك أود أن أحقق في هذا الأمر بشكل أكبر. هل يمكنكم تزويدي ببعض التفاصيل الإضافية، مثل رقم الطلب وتاريخ التسليم المتوقع؟ سأعمل على حل هذه المشكلة بأسرع وقت ممكن."}]}
3
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: اذكر ثلاثة اعتراضات يمكن أن يكون لدى العميل بشأن شراء منتج"}, {"role": "user", "content": "اذكر ثلاثة اعتراضات يمكن أن يكون لدى العميل بشأن شراء منتج\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "بعض الاعتراضات المحتملة التي يمكن أن يكون لدى العميل بشأن شراء منتج تشمل التكلفة وميزات المنتج وأوقات التسليم أو الشحن. يمكن أن تكون التكلفة عاملاً إذا شعر العميل بأن المنتج لا يستحق السعر. قد تكون ميزات المنتج مصدر قلق إذا لم يفي المنتج بحاجات أو توقعات العميل. وأخيرًا، قد تكون أوقات التسليم أو الشحن مشكلة إذا كان العميل بحاجة إلى المنتج بسرعة وكان وقت الانتظار طويلاً."}]}
4
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: ما هو إعراب مهما؟"}, {"role": "user", "content": "ما هو إعراب مهما؟\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "​مَهْمَا: اسم شرط مبني على السكون، يعرب مبتدأ وخبره جملة الشرط والجواب، وهو يجزم فعلين مضارعين.ومن أمثلته قول الله تعالى: \"وَقالُوا مَهْمَا تَأْتِنَا بِهِ مِنْ آيَةٍ لِتَسْحَرَنا بِهَا فَمَا نَحْنُ لَكَ بِمُؤْمِنِينَ\".وقول زهير بن أبي سلمى:وَمَهْمَا تَكُنْ عِنْدَ امْرِئٍ مِنْ خَلِيقَةٍ * وَإِنْ خالَهَا تَخْفَى عَلَى النَّاسِ تُعْلَمِ"}]}
5
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: ما إعراب الجملة التي بين القوسين؟ إن (ما قلته الصواب)."}, {"role": "user", "content": "ما إعراب الجملة التي بين القوسين؟ إن (ما قلته الصواب).\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "​إنَّ ما قُلْتَهُ الصَّوابُ. ما: اسم موصول مبني على السكون في محل نصب اسم \"إنّ\".قُلْتَهُ: فعل ماض مبني على السكون لاتصاله بضمير رفع متحرك، والتاء ضمير متصل في محل رفع فاعل، والهاء ضمير متصل في محل نصب مفعول به. الصَّوابُ: خبر \"إنّ\" مرفوع بالضمة.ويمكن أن تعرب الجملة الموصولية \"ما قلته\" كلها اسم \"إنّ\"، و\"الصواب\" خبرها."}]}
6
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: أنشئ قائمة من ثلاثة إجراءات يمكن اتخاذها لتقليل كمية النفايات المنتجة."}, {"role": "user", "content": "أنشئ قائمة من ثلاثة إجراءات يمكن اتخاذها لتقليل كمية النفايات المنتجة.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "1. استخدم المنتجات القابلة لإعادة الاستخدام بدلاً من العناصر القابلة للتصرف.\r\n2. ممارسة التسميد.\r\n3. إعادة استخدام وإعادة تدوير المنتجات في كل مرة تكون ذلك ممكنًا."}]}
7
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: طوّر تشبيهًا إبداعيًا يصف التطور."}, {"role": "user", "content": "طوّر تشبيهًا إبداعيًا يصف التطور.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "يمكن تشبيه التطور بلعبة عملاقة من الحجر والورق والمقص. الأنواع تحاول باستمرار تفوق بعضها البعض مع التكيف مع بيئتها المتغيرة باستمرار. يستخدم بعض المخلوقات دروعها القوية للحماية، تمامًا مثل الحجر الذي يتغلب على خصومه. يستخدم آخرون التمويه للبقاء مختبئين والتطور بصمت، مشابهً�� للورقة التي تفوز بالمفاجأة. وما زالت بعض الأنواع تتفوق في الخفة والحركة، وتنجح في التغلب على خصومها بنفس البراعة التي يستخدمها زوج من المقصات."}]}
8
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: ما إعراب جملة\" ما هذا الشيء\""}, {"role": "user", "content": "ما إعراب جملة\" ما هذا الشيء\"\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "​ما: اسم استفهام في محل رفع مبتدأ.هذا: اسم إشارة في محل رفع خبر.الشيءُ: اسم مرفوع بدل من هذا."}]}
9
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "What causes seasons on Earth?\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "الفصول تحدث لأن محور الأرض مائل أثناء دورانه حول الشمس. عندما تميل نصف الكرة الشمالي نحو الشمس يكون الصيف، وعندما يبتعد يكون الشتاء. هذا الميل لا يتغيّر طوال السنة، لكن موقعنا بالنسبة للشمس يتغيّر."}]}
10
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: اصنع قائمة من 5 مزايا للعيش في الريف"}, {"role": "user", "content": "اصنع قائمة من 5 مزايا للعيش في الريف\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "1. هواء نظيف: العيش في الريف يعني أنك لن تتعرض لتلوث الهواء الناجم عن المركبات وغيرها من المصادر.\r\n2. مناظر خضراء: استمتع بالهواء النقي والمساحات الواسعة والمناظر الجميلة.\r\n3. الإسكان بأسعار معقولة: غالبًا ما يكون الإسكان في الريف أكثر تكلفة من العيش في المدينة.\r\n4. أقل ضغط نفسي: استمتع بوتيرة الحياة الأكثر استرخاءً في الريف.\r\n5. مجتمع أقرب: غالبًا ما تكون المجتمعات الريفية متماسكة، مما يسمح بإقامة علاقات أكثر مع الجيران ذوي المعاني."}]}
research/data/language-lesson-eval-fr.jsonl ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Analyser le texte donné pour son ton.\nLe monde a été fortement touché par la pandémie de COVID-19 et il a radicalement changé nos vies.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Le ton du texte est grave et sombre. L'utilisation de termes tels que \"grand impacté\", \"modifié\" et \"pandémie\" suggère la gravité et la gravité de la situation, et véhicule une lourdeur et un souci."}]}
2
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Corriger toute faute d'orthographe ou de grammaire dans la phrase donnée.\nElle a fait du croquis des excersies autour du lac.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Elle a fait du saut d'exercice autour du lac."}]}
3
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Reformuler la phrase suivante au passé.\nElle écrit un roman inspiré de son grand-père.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Elle a écrit un roman inspiré par son grand-père."}]}
4
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Nommez trois fruits riches en vitamine C.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Trois fruits riches en vitamine C sont les oranges, les kiwis et les fraises."}]}
5
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Trouver la racine réelle de l'équation \n sqrt(x) est égal à 10.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "En soustrayant sqrt(x) des deux côtés et en racontant, nous obtenons \n x 2 x x x x x x x x x x x x x x x x.\nPar conséquent, 20 ptr(x) x, donc écailles, 98/20, 49/10. Par conséquent, x (49/10)2 x 2401/100, ou x 24,01."}]}
6
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Quelle est la première règle de la narration ?\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "La première règle de la narration est « Montre, ne le dites pas ». Cela signifie qu'une bonne histoire est une histoire où le public est immergé dans l'expérience à travers des détails, des actions et un dialogue vifs, plutôt que d'être simplement raconté sur les événements. En montrant ce qui se passe, le public peut mieux comprendre les personnages et suivre l'intrigue, ce qui rend l'histoire plus attrayante et mémorable."}]}
7
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Écrivez quelques phrases sur une figure historique donnée.\nAmelia Earhart\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Amelia Earhart a été une aviatrice pionnière et la première femme à voler en solo à travers l'océan Atlantique. Elle a battu de nombreux records et inspiré une génération de pilotes féminins. Sa disparition en 1937 lors d'une tentative de voler autour du monde reste l'un des plus grands mystères non résolus de l'histoire."}]}
8
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Générer une question qui pourrait suivre la réponse ci-dessous.\nJ'ai vu un film la semaine dernière.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Quel film as-tu vu la semaine dernière ?"}]}
9
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Regardez le point d'ébullition de l'eau.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Le point d'ébullition de l'eau est de 100 degrés Celsius ou de 212 degrés Fahrenheit à la pression atmosphérique standard."}]}
10
+ {"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "La variable x est définie comme étant « 4x x 2 ans x 10 ». Trouver la valeur de x.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Étant donné l'équation 4x x 2y x 10, il y a une infinité de solutions pour la valeur de x, en fonction de la valeur de y. En d'autres termes, la valeur de x dépend de la valeur de y.\n\nCependant, il est encore possible d'exprimer x en termes de y. Pour ce faire, nous pouvons réarranger l'équation pour résoudre x:\n\n4x 2 ans et 10\n4x 10 - 2 ans\nx x (10 - 2 ans)/4\n\nDonc, selon l'équation, la valeur de x est (10 - 2y)/4. Sans connaître la valeur spécifique de y, il n'est pas possible de déterminer la valeur exacte de x."}]}
research/data/language-lesson-fr.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
research/data/language-lesson-seeds.yaml ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hand-crafted TeacherVoice examples — exact inference prompt shape.
2
+ # Loaded by build_language_lesson_chat.py (always included; not held out for eval).
3
+
4
+ fr:
5
+ - mode: explain
6
+ topic: Photosynthèse
7
+ user: Qu'est-ce que la photosynthèse ?
8
+ assistant: >
9
+ La photosynthèse est le processus par lequel les plantes fabriquent leur
10
+ nourriture grâce à la lumière du soleil. Elles utilisent l'eau, le dioxyde
11
+ de carbone et l'énergie lumineuse pour produire du sucre et libérer de
12
+ l'oxygène. C'est ainsi que les forêts nous fournissent l'air que nous respirons.
13
+ - mode: lesson
14
+ topic: Les fractions
15
+ user: Comment introduire les fractions en classe de CM1 ?
16
+ assistant: >
17
+ Commence par montrer une pizza coupée en parts égales pour illustrer une moitié
18
+ et un quart. Demande aux élèves de nommer des fractions autour d'eux, puis
19
+ propose un exercice où ils comparent deux fractions simples avec un dessin.
20
+ - mode: explain
21
+ topic: null
22
+ user: Explain gravity in simple terms.
23
+ assistant: >
24
+ La gravité est la force qui attire les objets les uns vers les autres.
25
+ Sur Terre, elle nous maintient au sol et fait tomber les pommes des arbres.
26
+ Plus un objet est massif, plus son attraction gravitationnelle est forte.
27
+ - mode: lesson
28
+ topic: Le cycle de l'eau
29
+ user: Planifie une leçon courte sur le cycle de l'eau pour des élèves de 10 ans.
30
+ assistant: >
31
+ Objectif : comprendre évaporation, condensation et précipitation. Montre un
32
+ schéma simple, puis fais l'expérience de l'eau qui condense sur un verre froid.
33
+ Termine par un dessin étiqueté que chaque élève ramène à la maison.
34
+ - mode: explain
35
+ topic: L'électricité
36
+ user: Comment fonctionne un circuit électrique simple ?
37
+ assistant: >
38
+ Un circuit simple relie une pile, des fils et une ampoule en boucle fermée.
39
+ Le courant circule de la pile vers l'ampoule, qui transforme l'énergie en lumière.
40
+ Si tu coupes un fil, la boucle s'ouvre et l'ampoule s'éteint.
41
+
42
+ ar:
43
+ - mode: explain
44
+ topic: البناء الضوئي
45
+ user: ما هي عملية البناء الضوئي؟
46
+ assistant: >
47
+ البناء الضوئي هو الطريقة التي تصنع بها النباتات غذاءها باستخدام ضوء الشمس.
48
+ تمتص الأوراق الماء وثاني أكسيد الكربون وتُنتج سكراً وطلق الأكسجين في الهواء.
49
+ بدون هذه العملية لما كانت الحياة على الأرض ممكنة كما نعرفها اليوم.
50
+ - mode: lesson
51
+ topic: الكسور
52
+ user: كيف أقدّم درساً عن الكسور للصف الخامس؟
53
+ assistant: >
54
+ ابدأ برسم دائرة مقسّمة إلى أجزاء متساوية وسمِّ نصفاً وربعاً. اطلب من التلاميذ
55
+ إيجاد أمثلة من حياتهم اليومية، ثم قارنوا بين كسرين بسيطين باستخدام رسومات.
56
+ - mode: explain
57
+ topic: null
58
+ user: What causes seasons on Earth?
59
+ assistant: >
60
+ الفصول تحدث لأن محور الأرض مائل أثناء دورانه حول الشمس. عندما تميل نصف الكرة
61
+ الشمالي نحو الشمس يكون الصيف، وعندما يبتعد يكون الشتاء. هذا الميل لا يتغيّر
62
+ طوال السنة، لكن موقعنا بالنسبة للشمس يتغيّر.
63
+ - mode: lesson
64
+ topic: دورة الماء
65
+ user: خطّط درساً قصيراً عن دورة الماء لطلاب في سن العاشرة.
66
+ assistant: >
67
+ الهدف أن يفهم التلاميذ التبخّر والتكاثف والهطول. اعرض رسمًا مبسّطاً، ثم
68
+ جرّب تجربة تكاثف الماء على كوب بارد. اختم برسم مُوسوم يحمله كل تلميذ.
69
+ - mode: explain
70
+ topic: الكهرباء
71
+ user: كيف يعمل دائرة كهربائية بسيطة؟
72
+ assistant: >
73
+ الدائرة البسيطة تربط بطارية وأسلاكاً ومصباحاً في حلقة مغلقة. ينتقل التيار
74
+ من البطارية إلى المصباح فيتحوّل إلى ضوء. إذا قُطع أحد الأسلاك تنكسر الحلقة
75
+ وينطفئ المصباح فوراً.
research/evals/language_lesson_smoke.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Smoke-check language-lesson eval JSONL for TeacherVoice format."""
3
+
4
+ from __future__ import annotations
5
+
6
+ import argparse
7
+ import json
8
+ import re
9
+ import sys
10
+ from pathlib import Path
11
+
12
+ _REPO = Path(__file__).resolve().parents[2]
13
+ _DATA = _REPO / "research" / "data"
14
+
15
+ _JSON_LEAK = re.compile(r"^\s*[\{\[]|```")
16
+ _ARABIC = re.compile(r"[\u0600-\u06FF]")
17
+ _FRENCH_MARKERS = re.compile(
18
+ r"[\u00C0-\u024F]|"
19
+ r"\b(le|la|les|un|une|des|est|sont|pour|dans|avec|que|qui|comment|pourquoi|c'est|ce)\b",
20
+ re.IGNORECASE,
21
+ )
22
+ _VOICE_SUFFIX = "Reply now in 2-4 complete spoken sentences only"
23
+
24
+
25
+ def _load_rows(path: Path) -> list[dict]:
26
+ rows: list[dict] = []
27
+ with path.open(encoding="utf-8") as fh:
28
+ for line in fh:
29
+ line = line.strip()
30
+ if line:
31
+ rows.append(json.loads(line))
32
+ return rows
33
+
34
+
35
+ def _score_row(row: dict, *, language: str) -> list[str]:
36
+ issues: list[str] = []
37
+ messages = row.get("messages") or []
38
+ if len(messages) < 3:
39
+ issues.append("missing messages")
40
+ return issues
41
+ system = messages[0].get("content", "")
42
+ user = messages[-2].get("content", "")
43
+ assistant = messages[-1].get("content", "")
44
+
45
+ if "TeacherVoice" not in system:
46
+ issues.append("system missing TeacherVoice")
47
+ label = "French" if language == "fr" else "Arabic"
48
+ if f"Target language: {label}" not in system:
49
+ issues.append(f"system missing target language {label}")
50
+ if _VOICE_SUFFIX not in user:
51
+ issues.append("user missing voice suffix")
52
+ if not (40 <= len(assistant) <= 600):
53
+ issues.append(f"assistant length {len(assistant)} out of range")
54
+ if _JSON_LEAK.search(assistant):
55
+ issues.append("assistant looks like JSON/code")
56
+ if language == "ar" and not _ARABIC.search(assistant):
57
+ issues.append("assistant missing Arabic script")
58
+ if language == "fr" and not _FRENCH_MARKERS.search(assistant):
59
+ issues.append("assistant missing French markers")
60
+ return issues
61
+
62
+
63
+ def run_eval(*, language: str) -> int:
64
+ path = _DATA / f"language-lesson-eval-{language}.jsonl"
65
+ if not path.is_file():
66
+ print(f"skip {path.name} (not found)")
67
+ return 0
68
+ rows = _load_rows(path)
69
+ if not rows:
70
+ print(f"skip {path.name} (empty)")
71
+ return 0
72
+ bad = 0
73
+ for index, row in enumerate(rows):
74
+ issues = _score_row(row, language=language)
75
+ if issues:
76
+ bad += 1
77
+ print(f" row {index}: {', '.join(issues)}")
78
+ ok = len(rows) - bad
79
+ print(f"{language.upper()} eval: {ok}/{len(rows)} passed")
80
+ return 0 if bad == 0 else 1
81
+
82
+
83
+ def main() -> int:
84
+ parser = argparse.ArgumentParser(description=__doc__)
85
+ parser.add_argument("--language", choices=("fr", "ar", "both"), default="both")
86
+ args = parser.parse_args()
87
+ codes = ("fr", "ar") if args.language == "both" else (args.language,)
88
+ return max(run_eval(language=code) for code in codes)
89
+
90
+
91
+ if __name__ == "__main__":
92
+ sys.exit(main())
research/modal/_common.py CHANGED
@@ -145,9 +145,49 @@ _FINETUNE_FLAGS: dict[str, str] = {
145
  "lora_dropout": "--lora_dropout",
146
  "lora_targets": "--lora_targets",
147
  "val_split": "--val_split",
 
148
  }
149
 
150
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
  def build_finetune_cmd(job: dict[str, Any], out_dir: str) -> list[str]:
152
  cmd = [
153
  "uv",
@@ -206,6 +246,13 @@ def build_lm_eval_cmd(
206
  model_path: str | None = None,
207
  adapter_path: str | None = None,
208
  compare_to: str | None = None,
 
 
 
 
 
 
 
209
  ) -> list[str]:
210
  cmd = [
211
  "uv",
@@ -228,14 +275,58 @@ def build_lm_eval_cmd(
228
  cmd.extend(["--adapter", adapter_path])
229
  if compare_to:
230
  cmd.extend(["--compare-to", compare_to])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
231
  return cmd
232
 
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
  def prepare_jobs(
235
  *,
236
  job: str | None = None,
237
  category: str | None = None,
 
 
 
238
  max_steps: int | None = None,
 
 
239
  ) -> tuple[dict[str, Any], list[dict[str, Any]]]:
240
  spec = load_experiments()
241
  defaults = spec.get("defaults", {})
@@ -251,12 +342,41 @@ def prepare_jobs(
251
  jobs = [j for j in jobs if j.get("category") == category]
252
  if not jobs:
253
  raise SystemExit(f"No jobs with category {category!r}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
254
 
255
  prepared: list[dict[str, Any]] = []
256
  for raw in jobs:
257
  merged = apply_defaults(raw, defaults)
258
  if max_steps is not None:
259
  merged["max_steps"] = max_steps
 
 
 
 
 
 
 
 
 
 
 
260
  prepared.append(merged)
261
  return defaults, prepared
262
 
@@ -291,7 +411,15 @@ def primary_metric(task_metrics: dict[str, Any]) -> tuple[str, float] | None:
291
  return None
292
 
293
 
294
- def baseline_is_cached(experiment_name: str, config_path: str) -> bool:
 
 
 
 
 
 
 
 
295
  """True if a baseline results.json exists AND its run_meta still matches the
296
  profile config's tasks/limit/num_fewshot. Config changes (e.g. new guard
297
  tasks or a higher limit) therefore correctly force a fresh baseline."""
@@ -309,11 +437,20 @@ def baseline_is_cached(experiment_name: str, config_path: str) -> bool:
309
  cfg = yaml.safe_load(cfg_file.read_text()) or {}
310
  except Exception:
311
  return False
312
- return (
313
- sorted(meta.get("tasks") or []) == sorted(cfg.get("tasks") or [])
314
- and meta.get("limit") == cfg.get("limit")
315
- and meta.get("num_fewshot") == cfg.get("num_fewshot", 0)
 
 
 
 
 
 
316
  )
 
 
 
317
 
318
 
319
  def evaluate_gate(
 
145
  "lora_dropout": "--lora_dropout",
146
  "lora_targets": "--lora_targets",
147
  "val_split": "--val_split",
148
+ "device": "--device",
149
  }
150
 
151
 
152
+ def split_csv(value: str | None) -> list[str] | None:
153
+ if not value:
154
+ return None
155
+ items = [item.strip() for item in value.split(",") if item.strip()]
156
+ return items or None
157
+
158
+
159
+ def parse_json_object(value: str | None, *, flag: str) -> dict[str, Any]:
160
+ if not value:
161
+ return {}
162
+ try:
163
+ parsed = json.loads(value)
164
+ except json.JSONDecodeError as exc:
165
+ raise SystemExit(f"{flag} must be a JSON object: {exc}") from exc
166
+ if not isinstance(parsed, dict):
167
+ raise SystemExit(f"{flag} must be a JSON object")
168
+ return parsed
169
+
170
+
171
+ def job_plan_rows(jobs: list[dict[str, Any]]) -> list[dict[str, Any]]:
172
+ """Compact, printable description of selected jobs and their eval profile."""
173
+ rows = []
174
+ for job in jobs:
175
+ rows.append(
176
+ {
177
+ "name": job.get("name"),
178
+ "category": job.get("category"),
179
+ "usecase": job.get("usecase") or job.get("use_case"),
180
+ "profile": job.get("eval_profile", "compare_study"),
181
+ "dataset": "mix" if job.get("mix") else job.get("dataset"),
182
+ "mode": job.get("mode", "lora"),
183
+ "max_steps": job.get("max_steps"),
184
+ "max_samples": job.get("max_samples"),
185
+ "publish": bool(job.get("publish")),
186
+ }
187
+ )
188
+ return rows
189
+
190
+
191
  def build_finetune_cmd(job: dict[str, Any], out_dir: str) -> list[str]:
192
  cmd = [
193
  "uv",
 
246
  model_path: str | None = None,
247
  adapter_path: str | None = None,
248
  compare_to: str | None = None,
249
+ tasks: list[str] | None = None,
250
+ limit: int | None = None,
251
+ num_fewshot: int | None = None,
252
+ batch_size: str | None = None,
253
+ device: str | None = None,
254
+ dtype: str | None = None,
255
+ seed: int | None = None,
256
  ) -> list[str]:
257
  cmd = [
258
  "uv",
 
275
  cmd.extend(["--adapter", adapter_path])
276
  if compare_to:
277
  cmd.extend(["--compare-to", compare_to])
278
+ if tasks:
279
+ cmd.append("--tasks")
280
+ cmd.extend(tasks)
281
+ if limit is not None:
282
+ cmd.extend(["--limit", str(int(limit))])
283
+ if num_fewshot is not None:
284
+ cmd.extend(["--num-fewshot", str(int(num_fewshot))])
285
+ if batch_size:
286
+ cmd.extend(["--batch-size", str(batch_size)])
287
+ if device:
288
+ cmd.extend(["--device", str(device)])
289
+ if dtype:
290
+ cmd.extend(["--dtype", str(dtype)])
291
+ if seed is not None:
292
+ cmd.extend(["--seed", str(int(seed))])
293
  return cmd
294
 
295
 
296
+ def _matches_job_filters(
297
+ job: dict[str, Any],
298
+ *,
299
+ sector: str | None = None,
300
+ usecase: str | None = None,
301
+ profiles: list[str] | None = None,
302
+ ) -> bool:
303
+ if sector and job.get("sector", job.get("category")) != sector:
304
+ return False
305
+ if usecase:
306
+ values = {
307
+ job.get("usecase"),
308
+ job.get("use_case"),
309
+ job.get("category"),
310
+ job.get("name"),
311
+ }
312
+ values.update(job.get("tags") or [])
313
+ if usecase not in values:
314
+ return False
315
+ if profiles and job.get("eval_profile", "compare_study") not in profiles:
316
+ return False
317
+ return True
318
+
319
+
320
  def prepare_jobs(
321
  *,
322
  job: str | None = None,
323
  category: str | None = None,
324
+ sector: str | None = None,
325
+ usecase: str | None = None,
326
+ profiles: list[str] | None = None,
327
  max_steps: int | None = None,
328
+ max_samples: int | None = None,
329
+ finetune_overrides: dict[str, Any] | None = None,
330
  ) -> tuple[dict[str, Any], list[dict[str, Any]]]:
331
  spec = load_experiments()
332
  defaults = spec.get("defaults", {})
 
342
  jobs = [j for j in jobs if j.get("category") == category]
343
  if not jobs:
344
  raise SystemExit(f"No jobs with category {category!r}")
345
+ if sector or usecase or profiles:
346
+ jobs = [
347
+ j
348
+ for j in jobs
349
+ if _matches_job_filters(
350
+ j,
351
+ sector=sector,
352
+ usecase=usecase,
353
+ profiles=profiles,
354
+ )
355
+ ]
356
+ if not jobs:
357
+ filters = {
358
+ "sector": sector,
359
+ "usecase": usecase,
360
+ "profiles": profiles,
361
+ }
362
+ raise SystemExit(f"No jobs matched filters: {filters}")
363
 
364
  prepared: list[dict[str, Any]] = []
365
  for raw in jobs:
366
  merged = apply_defaults(raw, defaults)
367
  if max_steps is not None:
368
  merged["max_steps"] = max_steps
369
+ if max_samples is not None:
370
+ merged["max_samples"] = max_samples
371
+ if finetune_overrides:
372
+ args = {**(merged.get("args") or {})}
373
+ for key, value in finetune_overrides.items():
374
+ if key in _FINETUNE_FLAGS:
375
+ args[key] = value
376
+ else:
377
+ merged[key] = value
378
+ if args:
379
+ merged["args"] = args
380
  prepared.append(merged)
381
  return defaults, prepared
382
 
 
411
  return None
412
 
413
 
414
+ def baseline_is_cached(
415
+ experiment_name: str,
416
+ config_path: str,
417
+ *,
418
+ tasks: list[str] | None = None,
419
+ limit: int | None = None,
420
+ num_fewshot: int | None = None,
421
+ seed: int | None = None,
422
+ ) -> bool:
423
  """True if a baseline results.json exists AND its run_meta still matches the
424
  profile config's tasks/limit/num_fewshot. Config changes (e.g. new guard
425
  tasks or a higher limit) therefore correctly force a fresh baseline."""
 
437
  cfg = yaml.safe_load(cfg_file.read_text()) or {}
438
  except Exception:
439
  return False
440
+ expected_tasks = tasks or cfg.get("tasks") or []
441
+ expected_limit = limit if limit is not None else cfg.get("limit")
442
+ expected_fewshot = (
443
+ num_fewshot if num_fewshot is not None else cfg.get("num_fewshot", 0)
444
+ )
445
+ expected_seed = seed if seed is not None else cfg.get("seed")
446
+ same = (
447
+ sorted(meta.get("tasks") or []) == sorted(expected_tasks)
448
+ and meta.get("limit") == expected_limit
449
+ and meta.get("num_fewshot") == expected_fewshot
450
  )
451
+ if expected_seed is not None:
452
+ same = same and meta.get("seed") == expected_seed
453
+ return same
454
 
455
 
456
  def evaluate_gate(
research/modal/experiments.yaml CHANGED
@@ -190,3 +190,44 @@ finetune:
190
  max_samples: 200
191
  description: General instruction tuning baseline (Hub, local-only)
192
  eval_profile: instructions
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  max_samples: 200
191
  description: General instruction tuning baseline (Hub, local-only)
192
  eval_profile: instructions
193
+
194
+ # --- language lessons: FR/AR TeacherVoice coach (Cohere-free stack) ---
195
+ - name: language-lesson-lora
196
+ category: language
197
+ max_steps: 200
198
+ mix:
199
+ - dataset: research/data/language-lesson-fr.jsonl
200
+ format: chat
201
+ weight: 12
202
+ - dataset: research/data/language-lesson-ar.jsonl
203
+ format: chat
204
+ weight: 12
205
+ - dataset: research/data/science-tutor-chat.jsonl
206
+ format: chat
207
+ weight: 4
208
+ - dataset: tatsu-lab/alpaca
209
+ format: alpaca
210
+ dataset_split: "train[:400]"
211
+ max_samples: 400
212
+ weight: 1
213
+ args:
214
+ lora_r: 32
215
+ lora_alpha: 64
216
+ neftune_noise_alpha: 5
217
+ early_stopping_patience: 2
218
+ val_split: 0.05
219
+ description: >
220
+ FR/AR TeacherVoice LoRA from language-lesson-fr/ar.jsonl (Hub-built via
221
+ build_language_lesson_chat.py) + English replay
222
+ eval_profile: understanding
223
+ goals:
224
+ task: boolq
225
+ min_improve: 0.0
226
+ guard_tasks:
227
+ - task: hellaswag
228
+ max_regress: 0.03
229
+ publish:
230
+ hub_repo: MSGEncrypted/minicpm5-1b-language-lesson-lora
231
+ mirror_repos:
232
+ - build-small-hackathon/minicpm5-1b-language-lesson-lora
233
+ private: false
research/modal/finetune_app.py CHANGED
@@ -35,7 +35,7 @@ for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")
35
  if _candidate.is_dir() and str(_candidate) not in sys.path:
36
  sys.path.insert(0, str(_candidate))
37
 
38
- from _common import (
39
  BASE_MODEL_ID,
40
  FINETUNE_VOL_PATH,
41
  HF_CACHE_PATH,
@@ -50,8 +50,10 @@ from _common import (
50
  hf_secret,
51
  image,
52
  job_gpu,
53
- load_experiments,
 
54
  prepare_jobs,
 
55
  publish_adapter_files,
56
  pull_artifacts,
57
  reload_volumes,
@@ -107,6 +109,13 @@ def run_lm_eval(
107
  model_path: str | None = None,
108
  adapter_path: str | None = None,
109
  compare_to: str | None = None,
 
 
 
 
 
 
 
110
  ) -> dict[str, Any]:
111
  """Run slm-lm-eval on base model or finetuned checkpoint."""
112
  reload_volumes()
@@ -128,6 +137,13 @@ def run_lm_eval(
128
  model_path=model_path,
129
  adapter_path=adapter_path,
130
  compare_to=compare_to,
 
 
 
 
 
 
 
131
  )
132
  print("Running:", " ".join(cmd))
133
  proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
@@ -146,6 +162,13 @@ def run_lm_eval(
146
  "model_path": model_path,
147
  "adapter_path": adapter_path,
148
  "compare_to": compare_to,
 
 
 
 
 
 
 
149
  "results_json": str(results_json),
150
  "summary_md": str(summary_md),
151
  "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
@@ -213,9 +236,23 @@ def main(
213
  parallel: bool = False,
214
  job: str | None = None,
215
  category: str | None = None,
 
 
 
216
  max_steps: int | None = None,
 
 
217
  publish: bool = True,
218
  pull: bool = True,
 
 
 
 
 
 
 
 
 
219
  ):
220
  """
221
  Skill-matrix pipeline: per-profile baselines -> train -> eval -> gate -> publish -> pull.
@@ -227,21 +264,43 @@ def main(
227
  modal run research/modal/finetune_app.py --eval-only --job math-lora
228
  modal run research/modal/finetune_app.py --no-publish --no-pull
229
  """
230
- defaults, prepared = prepare_jobs(job=job, category=category, max_steps=max_steps)
 
 
 
 
 
 
 
 
 
 
 
231
  if not prepared:
232
  raise SystemExit("No matching jobs; check --job/--category and experiments.yaml")
233
  preset = defaults.get("preset", "minicpm5-1b")
 
 
 
 
234
 
235
- profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
236
 
237
  baselines_ok: dict[str, bool] = {}
238
- if not eval_only:
239
- print(f"--- baselines ({', '.join(profiles)}) ---")
240
- for profile in profiles:
241
  result = run_lm_eval.remote(
242
  experiment_name=f"{preset}__baseline__{profile}",
243
  config=config_for_profile(profile),
244
  preset=preset,
 
 
 
 
 
 
 
245
  )
246
  print(json.dumps(result, indent=2))
247
  baselines_ok[profile] = bool(result.get("ok"))
@@ -284,6 +343,13 @@ def main(
284
  model_path=BASE_MODEL_ID,
285
  adapter_path=adapter_path,
286
  compare_to=compare_to,
 
 
 
 
 
 
 
287
  )
288
  print(json.dumps(eval_result, indent=2))
289
 
@@ -291,6 +357,7 @@ def main(
291
  "name": job_name,
292
  "category": j.get("category"),
293
  "profile": profile,
 
294
  }
295
 
296
  gate_result: dict[str, Any] | None = None
 
35
  if _candidate.is_dir() and str(_candidate) not in sys.path:
36
  sys.path.insert(0, str(_candidate))
37
 
38
+ from _common import ( # noqa: E402
39
  BASE_MODEL_ID,
40
  FINETUNE_VOL_PATH,
41
  HF_CACHE_PATH,
 
50
  hf_secret,
51
  image,
52
  job_gpu,
53
+ job_plan_rows,
54
+ parse_json_object,
55
  prepare_jobs,
56
+ split_csv,
57
  publish_adapter_files,
58
  pull_artifacts,
59
  reload_volumes,
 
109
  model_path: str | None = None,
110
  adapter_path: str | None = None,
111
  compare_to: str | None = None,
112
+ tasks: list[str] | None = None,
113
+ limit: int | None = None,
114
+ num_fewshot: int | None = None,
115
+ batch_size: str | None = None,
116
+ device: str | None = None,
117
+ dtype: str | None = None,
118
+ seed: int | None = None,
119
  ) -> dict[str, Any]:
120
  """Run slm-lm-eval on base model or finetuned checkpoint."""
121
  reload_volumes()
 
137
  model_path=model_path,
138
  adapter_path=adapter_path,
139
  compare_to=compare_to,
140
+ tasks=tasks,
141
+ limit=limit,
142
+ num_fewshot=num_fewshot,
143
+ batch_size=batch_size,
144
+ device=device,
145
+ dtype=dtype,
146
+ seed=seed,
147
  )
148
  print("Running:", " ".join(cmd))
149
  proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
 
162
  "model_path": model_path,
163
  "adapter_path": adapter_path,
164
  "compare_to": compare_to,
165
+ "tasks": tasks,
166
+ "limit": limit,
167
+ "num_fewshot": num_fewshot,
168
+ "batch_size": batch_size,
169
+ "device": device,
170
+ "dtype": dtype,
171
+ "seed": seed,
172
  "results_json": str(results_json),
173
  "summary_md": str(summary_md),
174
  "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
 
236
  parallel: bool = False,
237
  job: str | None = None,
238
  category: str | None = None,
239
+ sector: str | None = None,
240
+ usecase: str | None = None,
241
+ profiles: str | None = None,
242
  max_steps: int | None = None,
243
+ max_samples: int | None = None,
244
+ finetune_args_json: str | None = None,
245
  publish: bool = True,
246
  pull: bool = True,
247
+ plan: bool = False,
248
+ skip_baseline: bool = False,
249
+ eval_tasks: str | None = None,
250
+ eval_limit: int | None = None,
251
+ eval_num_fewshot: int | None = None,
252
+ eval_batch_size: str | None = None,
253
+ eval_device: str | None = None,
254
+ eval_dtype: str | None = None,
255
+ eval_seed: int | None = None,
256
  ):
257
  """
258
  Skill-matrix pipeline: per-profile baselines -> train -> eval -> gate -> publish -> pull.
 
264
  modal run research/modal/finetune_app.py --eval-only --job math-lora
265
  modal run research/modal/finetune_app.py --no-publish --no-pull
266
  """
267
+ defaults, prepared = prepare_jobs(
268
+ job=job,
269
+ category=category,
270
+ sector=sector,
271
+ usecase=usecase,
272
+ profiles=split_csv(profiles),
273
+ max_steps=max_steps,
274
+ max_samples=max_samples,
275
+ finetune_overrides=parse_json_object(
276
+ finetune_args_json, flag="--finetune-args-json"
277
+ ),
278
+ )
279
  if not prepared:
280
  raise SystemExit("No matching jobs; check --job/--category and experiments.yaml")
281
  preset = defaults.get("preset", "minicpm5-1b")
282
+ plan_rows = job_plan_rows(prepared)
283
+ if plan:
284
+ print(json.dumps({"preset": preset, "jobs": plan_rows}, indent=2))
285
+ return
286
 
287
+ profile_names = sorted({j.get("eval_profile", "compare_study") for j in prepared})
288
 
289
  baselines_ok: dict[str, bool] = {}
290
+ if not eval_only and not skip_baseline:
291
+ print(f"--- baselines ({', '.join(profile_names)}) ---")
292
+ for profile in profile_names:
293
  result = run_lm_eval.remote(
294
  experiment_name=f"{preset}__baseline__{profile}",
295
  config=config_for_profile(profile),
296
  preset=preset,
297
+ tasks=split_csv(eval_tasks),
298
+ limit=eval_limit,
299
+ num_fewshot=eval_num_fewshot,
300
+ batch_size=eval_batch_size,
301
+ device=eval_device,
302
+ dtype=eval_dtype,
303
+ seed=eval_seed,
304
  )
305
  print(json.dumps(result, indent=2))
306
  baselines_ok[profile] = bool(result.get("ok"))
 
343
  model_path=BASE_MODEL_ID,
344
  adapter_path=adapter_path,
345
  compare_to=compare_to,
346
+ tasks=split_csv(eval_tasks),
347
+ limit=eval_limit,
348
+ num_fewshot=eval_num_fewshot,
349
+ batch_size=eval_batch_size,
350
+ device=eval_device,
351
+ dtype=eval_dtype,
352
+ seed=eval_seed,
353
  )
354
  print(json.dumps(eval_result, indent=2))
355
 
 
357
  "name": job_name,
358
  "category": j.get("category"),
359
  "profile": profile,
360
+ "plan": next((p for p in plan_rows if p["name"] == job_name), None),
361
  }
362
 
363
  gate_result: dict[str, Any] | None = None
research/modal/server_app.py CHANGED
@@ -43,7 +43,7 @@ for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")
43
  if _candidate.is_dir() and str(_candidate) not in sys.path:
44
  sys.path.insert(0, str(_candidate))
45
 
46
- from _common import (
47
  BASE_MODEL_ID,
48
  DEFAULT_GPU,
49
  DEFAULT_KEEPALIVE_HOURS,
@@ -52,7 +52,6 @@ from _common import (
52
  FINETUNE_VOL_PATH,
53
  HF_CACHE_PATH,
54
  LM_EVAL_OUTPUT,
55
- apply_defaults,
56
  baseline_is_cached,
57
  build_finetune_cmd,
58
  build_lm_eval_cmd,
@@ -63,8 +62,10 @@ from _common import (
63
  hf_cache_vol,
64
  hf_secret,
65
  image,
66
- load_experiments,
 
67
  prepare_jobs,
 
68
  publish_adapter_files,
69
  pull_artifacts,
70
  reload_volumes,
@@ -165,6 +166,13 @@ class GpuWorker:
165
  model_path: str | None = None,
166
  adapter_path: str | None = None,
167
  compare_to: str | None = None,
 
 
 
 
 
 
 
168
  ) -> dict[str, Any]:
169
  """Run slm-lm-eval on base model or finetuned checkpoint."""
170
  # Pick up adapters committed by another container (e.g. a separate
@@ -187,6 +195,13 @@ class GpuWorker:
187
  model_path=model_path,
188
  adapter_path=adapter_path,
189
  compare_to=compare_to,
 
 
 
 
 
 
 
190
  )
191
  print("Running:", " ".join(cmd))
192
  proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
@@ -205,6 +220,13 @@ class GpuWorker:
205
  "model_path": model_path,
206
  "adapter_path": adapter_path,
207
  "compare_to": compare_to,
 
 
 
 
 
 
 
208
  "results_json": str(results_json),
209
  "summary_md": str(summary_md),
210
  "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
@@ -252,43 +274,63 @@ class GpuWorker:
252
  *,
253
  job_names: list[str] | None = None,
254
  category: str | None = None,
 
 
 
255
  max_steps: int | None = None,
 
 
256
  train: bool = True,
257
  eval_only: bool = False,
 
 
 
 
 
 
 
 
258
  publish: bool = True,
 
259
  ) -> dict[str, Any]:
260
  """Per-profile baselines -> finetune -> eval -> gate -> publish (same container)."""
261
- spec = load_experiments()
262
- defaults = spec.get("defaults", {})
263
- jobs = spec.get("finetune", [])
264
-
 
 
 
 
 
 
265
  if job_names:
266
- jobs = [j for j in jobs if j.get("name") in job_names]
267
- if not jobs:
 
268
  raise ValueError(f"No matching jobs in experiments.yaml: {job_names}")
269
- if category:
270
- jobs = [j for j in jobs if j.get("category") == category]
271
- if not jobs:
272
- raise ValueError(f"No jobs with category {category!r}")
273
- if not jobs:
274
- raise ValueError("No jobs matched job_names/category")
275
 
276
  preset = defaults.get("preset", "minicpm5-1b")
277
- prepared: list[dict[str, Any]] = []
278
- for raw in jobs:
279
- merged = apply_defaults(raw, defaults)
280
- if max_steps is not None:
281
- merged["max_steps"] = max_steps
282
- prepared.append(merged)
283
-
284
- profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
285
 
286
  baselines_ok: dict[str, bool] = {}
287
- if not eval_only:
288
- for profile in profiles:
289
  exp = f"{preset}__baseline__{profile}"
290
  cfg_path = config_for_profile(profile)
291
- if baseline_is_cached(exp, cfg_path):
 
 
 
 
 
 
 
292
  print(f"baseline {exp}: reusing cached results (config unchanged)")
293
  baselines_ok[profile] = True
294
  continue
@@ -296,6 +338,13 @@ class GpuWorker:
296
  experiment_name=exp,
297
  config=cfg_path,
298
  preset=preset,
 
 
 
 
 
 
 
299
  )
300
  baselines_ok[profile] = bool(result.get("ok"))
301
 
@@ -325,12 +374,20 @@ class GpuWorker:
325
  model_path=BASE_MODEL_ID,
326
  adapter_path=adapter_path,
327
  compare_to=compare_to,
 
 
 
 
 
 
 
328
  )
329
 
330
  row: dict[str, Any] = {
331
  "name": job_name,
332
  "category": j.get("category"),
333
  "profile": profile,
 
334
  "eval": eval_result,
335
  }
336
 
@@ -374,13 +431,27 @@ def main(
374
  cmd: str | None = None,
375
  job: str | None = None,
376
  category: str | None = None,
 
 
 
377
  max_steps: int | None = None,
 
 
378
  eval_only: bool = False,
379
  pipeline: bool = False,
380
  publish: bool = True,
381
  publish_only: bool = False,
382
  pull: bool = True,
383
  ping: bool = False,
 
 
 
 
 
 
 
 
 
384
  ):
385
  """
386
  GPU worker CLI.
@@ -395,11 +466,25 @@ def main(
395
  modal run research/modal/server_app.py
396
  modal run research/modal/server_app.py --pipeline --job math-lora --max-steps 20
397
  modal run research/modal/server_app.py --pipeline --category science --no-publish
 
 
398
  modal run research/modal/server_app.py --eval-only --job math-lora
399
  modal run research/modal/server_app.py --publish-only --job math-lora
400
  modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
401
  """
402
- has_task = bool(cmd or job or category or eval_only or pipeline or publish_only or ping)
 
 
 
 
 
 
 
 
 
 
 
 
403
  if has_task:
404
  serve = False
405
 
@@ -451,18 +536,37 @@ def main(
451
  print(json.dumps(result, indent=2))
452
  return
453
 
454
- if pipeline or job or category or eval_only:
455
  job_names = [job] if job else None
456
  result = worker.run_pipeline.remote(
457
  job_names=job_names,
458
  category=category,
 
 
 
459
  max_steps=max_steps,
 
 
 
 
460
  train=not eval_only,
461
  eval_only=eval_only,
 
 
 
 
 
 
 
 
462
  publish=publish,
 
463
  )
464
  print(json.dumps(result, indent=2))
465
 
 
 
 
466
  if pull:
467
  for row in result.get("jobs", []):
468
  pull_artifacts(row["name"], f"{row['name']}__{row['profile']}")
 
43
  if _candidate.is_dir() and str(_candidate) not in sys.path:
44
  sys.path.insert(0, str(_candidate))
45
 
46
+ from _common import ( # noqa: E402
47
  BASE_MODEL_ID,
48
  DEFAULT_GPU,
49
  DEFAULT_KEEPALIVE_HOURS,
 
52
  FINETUNE_VOL_PATH,
53
  HF_CACHE_PATH,
54
  LM_EVAL_OUTPUT,
 
55
  baseline_is_cached,
56
  build_finetune_cmd,
57
  build_lm_eval_cmd,
 
62
  hf_cache_vol,
63
  hf_secret,
64
  image,
65
+ job_plan_rows,
66
+ parse_json_object,
67
  prepare_jobs,
68
+ split_csv,
69
  publish_adapter_files,
70
  pull_artifacts,
71
  reload_volumes,
 
166
  model_path: str | None = None,
167
  adapter_path: str | None = None,
168
  compare_to: str | None = None,
169
+ tasks: list[str] | None = None,
170
+ limit: int | None = None,
171
+ num_fewshot: int | None = None,
172
+ batch_size: str | None = None,
173
+ device: str | None = None,
174
+ dtype: str | None = None,
175
+ seed: int | None = None,
176
  ) -> dict[str, Any]:
177
  """Run slm-lm-eval on base model or finetuned checkpoint."""
178
  # Pick up adapters committed by another container (e.g. a separate
 
195
  model_path=model_path,
196
  adapter_path=adapter_path,
197
  compare_to=compare_to,
198
+ tasks=tasks,
199
+ limit=limit,
200
+ num_fewshot=num_fewshot,
201
+ batch_size=batch_size,
202
+ device=device,
203
+ dtype=dtype,
204
+ seed=seed,
205
  )
206
  print("Running:", " ".join(cmd))
207
  proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
 
220
  "model_path": model_path,
221
  "adapter_path": adapter_path,
222
  "compare_to": compare_to,
223
+ "tasks": tasks,
224
+ "limit": limit,
225
+ "num_fewshot": num_fewshot,
226
+ "batch_size": batch_size,
227
+ "device": device,
228
+ "dtype": dtype,
229
+ "seed": seed,
230
  "results_json": str(results_json),
231
  "summary_md": str(summary_md),
232
  "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
 
274
  *,
275
  job_names: list[str] | None = None,
276
  category: str | None = None,
277
+ sector: str | None = None,
278
+ usecase: str | None = None,
279
+ profiles: list[str] | None = None,
280
  max_steps: int | None = None,
281
+ max_samples: int | None = None,
282
+ finetune_overrides: dict[str, Any] | None = None,
283
  train: bool = True,
284
  eval_only: bool = False,
285
+ eval_tasks: list[str] | None = None,
286
+ eval_limit: int | None = None,
287
+ eval_num_fewshot: int | None = None,
288
+ eval_batch_size: str | None = None,
289
+ eval_device: str | None = None,
290
+ eval_dtype: str | None = None,
291
+ eval_seed: int | None = None,
292
+ skip_baseline: bool = False,
293
  publish: bool = True,
294
+ plan_only: bool = False,
295
  ) -> dict[str, Any]:
296
  """Per-profile baselines -> finetune -> eval -> gate -> publish (same container)."""
297
+ defaults, prepared = prepare_jobs(
298
+ job=None,
299
+ category=category,
300
+ sector=sector,
301
+ usecase=usecase,
302
+ profiles=profiles,
303
+ max_steps=max_steps,
304
+ max_samples=max_samples,
305
+ finetune_overrides=finetune_overrides,
306
+ )
307
  if job_names:
308
+ wanted = set(job_names)
309
+ prepared = [j for j in prepared if j.get("name") in wanted]
310
+ if not prepared:
311
  raise ValueError(f"No matching jobs in experiments.yaml: {job_names}")
312
+ if not prepared:
313
+ raise ValueError("No jobs matched the requested filters")
 
 
 
 
314
 
315
  preset = defaults.get("preset", "minicpm5-1b")
316
+ profile_names = sorted({j.get("eval_profile", "compare_study") for j in prepared})
317
+ plan = job_plan_rows(prepared)
318
+ if plan_only:
319
+ return {"preset": preset, "jobs": plan}
 
 
 
 
320
 
321
  baselines_ok: dict[str, bool] = {}
322
+ if not eval_only and not skip_baseline:
323
+ for profile in profile_names:
324
  exp = f"{preset}__baseline__{profile}"
325
  cfg_path = config_for_profile(profile)
326
+ if baseline_is_cached(
327
+ exp,
328
+ cfg_path,
329
+ tasks=eval_tasks,
330
+ limit=eval_limit,
331
+ num_fewshot=eval_num_fewshot,
332
+ seed=eval_seed,
333
+ ):
334
  print(f"baseline {exp}: reusing cached results (config unchanged)")
335
  baselines_ok[profile] = True
336
  continue
 
338
  experiment_name=exp,
339
  config=cfg_path,
340
  preset=preset,
341
+ tasks=eval_tasks,
342
+ limit=eval_limit,
343
+ num_fewshot=eval_num_fewshot,
344
+ batch_size=eval_batch_size,
345
+ device=eval_device,
346
+ dtype=eval_dtype,
347
+ seed=eval_seed,
348
  )
349
  baselines_ok[profile] = bool(result.get("ok"))
350
 
 
374
  model_path=BASE_MODEL_ID,
375
  adapter_path=adapter_path,
376
  compare_to=compare_to,
377
+ tasks=eval_tasks,
378
+ limit=eval_limit,
379
+ num_fewshot=eval_num_fewshot,
380
+ batch_size=eval_batch_size,
381
+ device=eval_device,
382
+ dtype=eval_dtype,
383
+ seed=eval_seed,
384
  )
385
 
386
  row: dict[str, Any] = {
387
  "name": job_name,
388
  "category": j.get("category"),
389
  "profile": profile,
390
+ "plan": next((p for p in plan if p["name"] == job_name), None),
391
  "eval": eval_result,
392
  }
393
 
 
431
  cmd: str | None = None,
432
  job: str | None = None,
433
  category: str | None = None,
434
+ sector: str | None = None,
435
+ usecase: str | None = None,
436
+ profiles: str | None = None,
437
  max_steps: int | None = None,
438
+ max_samples: int | None = None,
439
+ finetune_args_json: str | None = None,
440
  eval_only: bool = False,
441
  pipeline: bool = False,
442
  publish: bool = True,
443
  publish_only: bool = False,
444
  pull: bool = True,
445
  ping: bool = False,
446
+ plan: bool = False,
447
+ skip_baseline: bool = False,
448
+ eval_tasks: str | None = None,
449
+ eval_limit: int | None = None,
450
+ eval_num_fewshot: int | None = None,
451
+ eval_batch_size: str | None = None,
452
+ eval_device: str | None = None,
453
+ eval_dtype: str | None = None,
454
+ eval_seed: int | None = None,
455
  ):
456
  """
457
  GPU worker CLI.
 
466
  modal run research/modal/server_app.py
467
  modal run research/modal/server_app.py --pipeline --job math-lora --max-steps 20
468
  modal run research/modal/server_app.py --pipeline --category science --no-publish
469
+ modal run research/modal/server_app.py --pipeline --sector science --eval-limit 25
470
+ modal run research/modal/server_app.py --plan --profiles math,science
471
  modal run research/modal/server_app.py --eval-only --job math-lora
472
  modal run research/modal/server_app.py --publish-only --job math-lora
473
  modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
474
  """
475
+ has_task = bool(
476
+ cmd
477
+ or job
478
+ or category
479
+ or sector
480
+ or usecase
481
+ or profiles
482
+ or eval_only
483
+ or pipeline
484
+ or publish_only
485
+ or ping
486
+ or plan
487
+ )
488
  if has_task:
489
  serve = False
490
 
 
536
  print(json.dumps(result, indent=2))
537
  return
538
 
539
+ if pipeline or job or category or sector or usecase or profiles or eval_only or plan:
540
  job_names = [job] if job else None
541
  result = worker.run_pipeline.remote(
542
  job_names=job_names,
543
  category=category,
544
+ sector=sector,
545
+ usecase=usecase,
546
+ profiles=split_csv(profiles),
547
  max_steps=max_steps,
548
+ max_samples=max_samples,
549
+ finetune_overrides=parse_json_object(
550
+ finetune_args_json, flag="--finetune-args-json"
551
+ ),
552
  train=not eval_only,
553
  eval_only=eval_only,
554
+ eval_tasks=split_csv(eval_tasks),
555
+ eval_limit=eval_limit,
556
+ eval_num_fewshot=eval_num_fewshot,
557
+ eval_batch_size=eval_batch_size,
558
+ eval_device=eval_device,
559
+ eval_dtype=eval_dtype,
560
+ eval_seed=eval_seed,
561
+ skip_baseline=skip_baseline,
562
  publish=publish,
563
+ plan_only=plan,
564
  )
565
  print(json.dumps(result, indent=2))
566
 
567
+ if plan:
568
+ return
569
+
570
  if pull:
571
  for row in result.get("jobs", []):
572
  pull_artifacts(row["name"], f"{row['name']}__{row['profile']}")
research/modal/tests/test_modal_common.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ import sys
4
+
5
+ sys.path.insert(0, str(Path(__file__).resolve().parents[3]))
6
+
7
+ from research.modal._common import ( # noqa: E402
8
+ build_finetune_cmd,
9
+ build_lm_eval_cmd,
10
+ prepare_jobs,
11
+ split_csv,
12
+ )
13
+
14
+
15
+ def test_build_lm_eval_cmd_accepts_runtime_overrides():
16
+ cmd = build_lm_eval_cmd(
17
+ experiment_name="exp",
18
+ config="cfg.yaml",
19
+ preset="minicpm5-1b",
20
+ tasks=["arc_easy", "hellaswag"],
21
+ limit=5,
22
+ num_fewshot=1,
23
+ batch_size="2",
24
+ device="cuda",
25
+ dtype="float16",
26
+ seed=7,
27
+ )
28
+
29
+ assert cmd[-15:] == [
30
+ "--tasks",
31
+ "arc_easy",
32
+ "hellaswag",
33
+ "--limit",
34
+ "5",
35
+ "--num-fewshot",
36
+ "1",
37
+ "--batch-size",
38
+ "2",
39
+ "--device",
40
+ "cuda",
41
+ "--dtype",
42
+ "float16",
43
+ "--seed",
44
+ "7",
45
+ ]
46
+
47
+
48
+ def test_prepare_jobs_filters_and_applies_finetune_overrides():
49
+ _, jobs = prepare_jobs(
50
+ sector="math",
51
+ profiles=["math"],
52
+ max_steps=3,
53
+ max_samples=11,
54
+ finetune_overrides={"lr": 1e-4, "lora_r": 8, "dataset_split": "train[:11]"},
55
+ )
56
+
57
+ assert [job["name"] for job in jobs] == ["math-lora"]
58
+ job = jobs[0]
59
+ assert job["max_steps"] == 3
60
+ assert job["max_samples"] == 11
61
+ assert job["dataset_split"] == "train[:11]"
62
+ assert job["args"]["lr"] == 1e-4
63
+ assert job["args"]["lora_r"] == 8
64
+
65
+ cmd = build_finetune_cmd(job, "/tmp/out")
66
+ assert "--max_steps" in cmd
67
+ assert cmd[cmd.index("--max_steps") + 1] == "3"
68
+ assert "--lr" in cmd
69
+ assert cmd[cmd.index("--lr") + 1] == "0.0001"
70
+ assert "--lora_r" in cmd
71
+ assert cmd[cmd.index("--lora_r") + 1] == "8"
72
+
73
+
74
+ def test_split_csv_trims_empty_values():
75
+ assert split_csv(" math, science ,,code ") == ["math", "science", "code"]
76
+ assert split_csv(None) is None
voice_models.yaml CHANGED
@@ -2,12 +2,13 @@
2
  # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
3
 
4
  defaults:
5
- asr_preset: cohere-transcribe
6
  tts_preset: piper-multilingual
7
  # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
8
  realtime_tts_preset: vibevoice-realtime-0.5b
9
- coach_model: tiny-aya-global
10
  coach_fallbacks:
 
11
  - minicpm5-1b
12
  max_seconds: 30
13
 
 
2
  # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
3
 
4
  defaults:
5
+ asr_preset: whisper-cpp-base
6
  tts_preset: piper-multilingual
7
  # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
8
  realtime_tts_preset: vibevoice-realtime-0.5b
9
+ coach_model: minicpm5-1b-language-lesson-hub
10
  coach_fallbacks:
11
+ - minicpm5-1b-language-lesson-lora
12
  - minicpm5-1b
13
  max_seconds: 30
14