Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

msg encrypted ai

MSGEncrypted commited on 8 days ago

Commit

aac5f23

1 Parent(s): 1e52a1f

Feat/sprint last 2hours (#22)

Browse files

* common fix wip

* server app wip fix

* language stuff

* readme

* server app fix

* eval language

* wip language app lesson

* wip fix

* test modal

* test modal and fix

* voice fallback

* readme wip

---------

Co-authored-by: MSGhais <msghais135@gmail.com>

Files changed (18) hide show

.env.example +6 -6
README.md +6 -0
USAGE.md +18 -8
apps/gradio-space/src/gradio_space/api/studio.py +2 -2
models.yaml +14 -0
research/data/build_language_lesson_chat.py +378 -0
research/data/language-lesson-ar.jsonl +0 -0
research/data/language-lesson-eval-ar.jsonl +10 -0
research/data/language-lesson-eval-fr.jsonl +10 -0
research/data/language-lesson-fr.jsonl +0 -0
research/data/language-lesson-seeds.yaml +75 -0
research/evals/language_lesson_smoke.py +92 -0
research/modal/_common.py +142 -5
research/modal/experiments.yaml +41 -0
research/modal/finetune_app.py +74 -7
research/modal/server_app.py +132 -28
research/modal/tests/test_modal_common.py +76 -0
voice_models.yaml +3 -2

.env.example CHANGED Viewed

@@ -64,14 +64,16 @@ ACTIVE_MODEL=minicpm5-1b
 # --- EchoCoach / Language lessons (voice stack) ---
 # VOICE_PRESETS_PATH=./voice_models.yaml
-# Recommended for Cohere Labs partner demo (GPU Space):
 # ECHOCOACH_ASR_PRESET=cohere-transcribe
 # ECHOCOACH_COACH_MODEL=tiny-aya-global
-# Comma-separated preset keys from models.yaml if primary coach fails to load:
-# ECHOCOACH_COACH_FALLBACK=minicpm5-1b
 # ECHOCOACH_TTS_PRESET=piper-multilingual
 # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
-# Dev fallback (CPU):
 # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
 # ECHOCOACH_COACH_MODEL=minicpm5-1b
 # ECHOCOACH_MAX_SECONDS=30
@@ -79,7 +81,5 @@ ACTIVE_MODEL=minicpm5-1b
 # ECHOCOACH_VOICE_PROFILE=pipeline   # pipeline (default) or omni for MiniCPM-o attempt
 # ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
 # PIPER_VOICES_DIR=~/.local/share/piper/voices
-# For Cohere Transcribe ASR: huggingface-cli login + accept model terms, then:
-# ECHOCOACH_ASR_PRESET=cohere-transcribe
 BASE=openbmb/MiniCPM5-1B

 # --- EchoCoach / Language lessons (voice stack) ---
 # VOICE_PRESETS_PATH=./voice_models.yaml
+# Default (Cohere-free): Whisper ASR + OpenBMB language-lesson LoRA coach
+# ECHOCOACH_ASR_PRESET=whisper-cpp-base
+# ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
+# ECHOCOACH_COACH_FALLBACK=minicpm5-1b-language-lesson-lora,minicpm5-1b
+# Optional Cohere Labs partner demo (GPU Space + HF gated models):
 # ECHOCOACH_ASR_PRESET=cohere-transcribe
 # ECHOCOACH_COACH_MODEL=tiny-aya-global
 # ECHOCOACH_TTS_PRESET=piper-multilingual
 # ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
+# Dev fallback (CPU, no LoRA):
 # ECHOCOACH_ASR_PRESET=whisper-cpp-tiny
 # ECHOCOACH_COACH_MODEL=minicpm5-1b
 # ECHOCOACH_MAX_SECONDS=30
 # ECHOCOACH_VOICE_PROFILE=pipeline   # pipeline (default) or omni for MiniCPM-o attempt
 # ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
 # PIPER_VOICES_DIR=~/.local/share/piper/voices
 BASE=openbmb/MiniCPM5-1B

README.md CHANGED Viewed

@@ -31,8 +31,13 @@ See **[USAGE.md](USAGE.md)** for local run, Gradio SDK / ZeroGPU Space deploymen
 **Demo video:** [https://www.youtube.com/watch?v=bwtOiZvJ-7k](https://www.youtube.com/watch?v=bwtOiZvJ-7k)
 **X post:** [https://x.com/MSG_Encrypted/status/2066570320861921748](https://x.com/MSG_Encrypted/status/2066570320861921748)
 ## Prerequisites
 - [uv](https://docs.astral.sh/uv/)
@@ -175,6 +180,7 @@ A root `Dockerfile` is kept for a later **Docker SDK** deploy (flip README to `s
 - Space live under build-small-hackathon
 - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
 - Social post published: [X](https://x.com/MSG_Encrypted/status/2066570320861921748)
 - Submission by **June 15, 2026**

 **Demo video:** [https://www.youtube.com/watch?v=bwtOiZvJ-7k](https://www.youtube.com/watch?v=bwtOiZvJ-7k)
+**Blog post:** [Small Models, Bounded Jobs](https://huggingface.co/blog/build-small-hackathon/lessonagent-opennotebook) — Hugging Face Build Small Hackathon write-up
 **X post:** [https://x.com/MSG_Encrypted/status/2066570320861921748](https://x.com/MSG_Encrypted/status/2066570320861921748)
+**Github:** [https://github.com/MSghais/small-model-hackathon/](https://github.com/MSghais/small-model-hackathon/)
 ## Prerequisites
 - [uv](https://docs.astral.sh/uv/)
 - Space live under build-small-hackathon
 - Demo video: [YouTube](https://www.youtube.com/watch?v=bwtOiZvJ-7k) — real user enters topic → download `.pptx` → show agent trace
+- Blog post: [Small Models, Bounded Jobs](https://huggingface.co/blog/build-small-hackathon/lessonagent-opennotebook)
 - Social post published: [X](https://x.com/MSG_Encrypted/status/2066570320861921748)
 - Submission by **June 15, 2026**

USAGE.md CHANGED Viewed

@@ -2,7 +2,7 @@
 How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
-The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (Cohere Transcribe + Tiny Aya), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
 ## Prerequisites
@@ -146,10 +146,10 @@ Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
 | Variable | Default | Description |
 | -------- | ------- | ----------- |
-| `ECHOCOACH_ASR_PRESET` | `cohere-transcribe` | ASR preset key (Space demo); use `whisper-cpp-tiny` on CPU dev |
 | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
 | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
-| `ECHOCOACH_COACH_MODEL` | `tiny-aya-global` | Text coach preset (Tiny Aya; from `models.yaml`) |
 | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
 | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
@@ -169,15 +169,25 @@ The **Language lessons** tab is the primary voice learning experience: one page
 | ----- | ------ |
 | Type a question | Chat bubble in target language |
 | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
-| **Other (text only)** language code | Tiny Aya written lesson (no Piper voice for unsupported codes) |
-**Stack (Cohere Labs partner demo):** [Cohere Transcribe](https://huggingface.co/CohereLabs/c4ai-transcribe-v2) (14 voice langs) → [Tiny Aya Global / regional](https://huggingface.co/CohereLabs/tiny-aya-global) (70+ text langs) → Piper or VibeVoice Realtime for speech out.
-Set Space secrets (GPU recommended):
 ```bash
-ECHOCOACH_ASR_PRESET=cohere-transcribe
-ECHOCOACH_COACH_MODEL=tiny-aya-global
 ECHOCOACH_TTS_PRESET=piper-multilingual
 ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
 ```

 How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
+The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
 ## Prerequisites
 | Variable | Default | Description |
 | -------- | ------- | ----------- |
+| `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo |
 | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
 | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
+| `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) |
 | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
 | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
 | ----- | ------ |
 | Type a question | Chat bubble in target language |
 | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
+| **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |
+**Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR → [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) → Piper or VibeVoice Realtime for speech out.
+Rebuild training JSONL from Hugging Face sources:
 ```bash
+uv run python research/data/build_language_lesson_chat.py
+modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
+```
+Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).
+Default `.env` / Space secrets:
+```bash
+ECHOCOACH_ASR_PRESET=whisper-cpp-base
+ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
+ECHOCOACH_COACH_FALLBACK=minicpm5-1b
 ECHOCOACH_TTS_PRESET=piper-multilingual
 ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
 ```

apps/gradio-space/src/gradio_space/api/studio.py CHANGED Viewed

@@ -10,7 +10,7 @@ import gradio as gr
 from echocoach.config import get_echo_coach_config
 from echocoach.pipeline import run_echo_coach
-from echocoach.prompts import TeacherVoiceMode, resolve_aya_preset
 from echocoach.recording import (
     ServerRecordingError,
     recording_backend_status,
@@ -187,7 +187,7 @@ def _coach_model_key(
     elif coach_variant and coach_variant not in ("auto", ""):
         key = coach_variant.strip()
     else:
-        key = resolve_aya_preset(language, coach_variant)
     if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
         key = "tiny-aya-global"
     return key

 from echocoach.config import get_echo_coach_config
 from echocoach.pipeline import run_echo_coach
+from echocoach.prompts import TeacherVoiceMode
 from echocoach.recording import (
     ServerRecordingError,
     recording_backend_status,
     elif coach_variant and coach_variant not in ("auto", ""):
         key = coach_variant.strip()
     else:
+        key = _echo_config.coach_model
     if key in ("tiny-aya-water", "tiny-aya-fire", "tiny-aya-earth", "auto"):
         key = "tiny-aya-global"
     return key

models.yaml CHANGED Viewed

@@ -94,6 +94,20 @@ models:
     adapter_path: MSGEncrypted/minicpm5-1b-math-lora
     trust_remote_code: true
   tiny-aya-global:
     label: Tiny Aya Global 3.3B (multilingual coach)
     backend: transformers

     adapter_path: MSGEncrypted/minicpm5-1b-math-lora
     trust_remote_code: true
+  minicpm5-1b-language-lesson-hub:
+    label: MiniCPM5 1B language lesson LoRA (FR/AR, Hub)
+    backend: transformers
+    model_id: openbmb/MiniCPM5-1B
+    adapter_path: MSGEncrypted/minicpm5-1b-language-lesson-lora
+    trust_remote_code: true
+  minicpm5-1b-language-lesson-lora:
+    label: MiniCPM5 1B language lesson LoRA (FR/AR, local)
+    backend: transformers
+    model_id: openbmb/MiniCPM5-1B
+    adapter_path: ./models/finetuned/language-lesson-lora
+    trust_remote_code: true
   tiny-aya-global:
     label: Tiny Aya Global 3.3B (multilingual coach)
     backend: transformers

research/data/build_language_lesson_chat.py ADDED Viewed

	@@ -0,0 +1,378 @@

+#!/usr/bin/env python3
+"""Build TeacherVoice-shaped FR/AR chat JSONL from Hugging Face sources + seeds.
+Exports:
+  research/data/language-lesson-fr.jsonl
+  research/data/language-lesson-ar.jsonl
+  research/data/language-lesson-eval-fr.jsonl  (5% holdout)
+  research/data/language-lesson-eval-ar.jsonl
+Usage:
+  uv run python research/data/build_language_lesson_chat.py
+  uv run python research/data/build_language_lesson_chat.py --max-per-source 500 --skip-hub
+"""
+from __future__ import annotations
+import argparse
+import json
+import random
+import re
+import sys
+from collections.abc import Iterator
+from pathlib import Path
+from typing import Any, Literal
+import yaml
+_REPO_ROOT = Path(__file__).resolve().parents[2]
+_DATA_DIR = Path(__file__).resolve().parent
+if str(_REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(_REPO_ROOT))
+from echocoach.prompts import (  # noqa: E402
+    system_prompt_for_mode,
+    topic_context_block,
+)
+from echocoach.teacher_voice import _VOICE_USER_SUFFIX  # noqa: E402
+VoiceMode = Literal["explain", "lesson"]
+MIN_ASSISTANT_CHARS = 40
+MAX_ASSISTANT_CHARS = 600
+EVAL_HOLDOUT_RATIO = 0.05
+DEFAULT_FR_SOURCES = (
+    "angeluriot/french_instruct",
+    "CohereLabs/aya_dataset",
+    "pinzhenchen/alpaca-cleaned-fr",
+)
+DEFAULT_AR_SOURCES = (
+    "arbml/CIDAR",
+    "ClusterlabAi/InstAr-500k",
+    "CohereLabs/aya_dataset",
+)
+SOURCE_CAPS: dict[str, dict[str, int]] = {
+    "angeluriot/french_instruct": {"fr": 8000},
+    "CohereLabs/aya_dataset": {"fr": 3000, "ar": 3000},
+    "pinzhenchen/alpaca-cleaned-fr": {"fr": 2000},
+    "arbml/CIDAR": {"ar": 8000},
+    "ClusterlabAi/InstAr-500k": {"ar": 5000},
+}
+_INSTAR_GOOD_TASKS = frozenset(
+    {
+        "Open QA",
+        "Extraction and Explanation",
+        "Summarization",
+        "Classification",
+    }
+)
+_CODE_MARKERS = re.compile(r"```|^\s*def |^\s*class |^\s*import ", re.MULTILINE)
+_JSON_START = re.compile(r"^\s*[\{\[]")
+def _assistant_ok(text: str) -> bool:
+    text = (text or "").strip()
+    if len(text) < MIN_ASSISTANT_CHARS or len(text) > MAX_ASSISTANT_CHARS:
+        return False
+    if _JSON_START.match(text):
+        return False
+    if _CODE_MARKERS.search(text):
+        return False
+    if text.count("\n") > 8:
+        return False
+    return True
+def _pick_mode(rng: random.Random, *, topic: str | None) -> VoiceMode:
+    if topic and rng.random() < 0.4:
+        return "lesson"
+    return "explain" if rng.random() < 0.6 else "lesson"
+def _wrap_row(
+    *,
+    language: str,
+    mode: VoiceMode,
+    user_text: str,
+    assistant_text: str,
+    topic: str | None = None,
+) -> dict[str, Any]:
+    system = system_prompt_for_mode(mode, language=language)
+    topic_line = topic_context_block(topic, mode)
+    if topic_line:
+        system = f"{system}\n\n{topic_line}"
+    user_body = f"{user_text.strip()}\n\n{_VOICE_USER_SUFFIX}"
+    return {
+        "messages": [
+            {"role": "system", "content": system},
+            {"role": "user", "content": user_body},
+            {"role": "assistant", "content": assistant_text.strip()},
+        ]
+    }
+def _load_seeds(path: Path) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    if not path.is_file():
+        return [], []
+    raw = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
+    fr_rows: list[dict[str, Any]] = []
+    ar_rows: list[dict[str, Any]] = []
+    for lang, key in (("fr", "fr"), ("ar", "ar")):
+        for item in raw.get(key, []):
+            mode = item.get("mode", "explain")
+            topic = item.get("topic")
+            if topic in (None, "null", ""):
+                topic = None
+            row = _wrap_row(
+                language=lang,
+                mode=mode,  # type: ignore[arg-type]
+                user_text=str(item["user"]),
+                assistant_text=str(item["assistant"]),
+                topic=str(topic) if topic else None,
+            )
+            (fr_rows if key == "fr" else ar_rows).append(row)
+    return fr_rows, ar_rows
+def _iter_french_instruct(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
+    from datasets import load_dataset
+    ds = load_dataset("angeluriot/french_instruct", split="train", streaming=True)
+    count = 0
+    for row in ds:
+        messages = row.get("messages") or row.get("conversation")
+        if not messages:
+            continue
+        user_text = ""
+        assistant_text = ""
+        for msg in messages:
+            role = (msg.get("role") or msg.get("from") or "").lower()
+            content = (msg.get("content") or msg.get("value") or "").strip()
+            if role in ("user", "human"):
+                user_text = content
+            elif role in ("assistant", "gpt", "bot") and content:
+                assistant_text = content
+        if user_text and _assistant_ok(assistant_text):
+            yield user_text, assistant_text, None
+            count += 1
+            if count >= max_rows:
+                break
+def _iter_aya(language_code: str, max_rows: int) -> Iterator[tuple[str, str, str | None]]:
+    from datasets import load_dataset
+    ds = load_dataset("CohereLabs/aya_dataset", split="train")
+    count = 0
+    for row in ds:
+        if row.get("language") != language_code:
+            continue
+        user_text = (row.get("inputs") or "").strip()
+        assistant_text = (row.get("targets") or "").strip()
+        if user_text and _assistant_ok(assistant_text):
+            yield user_text, assistant_text, None
+            count += 1
+            if count >= max_rows:
+                break
+def _iter_alpaca_fr(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
+    from datasets import load_dataset
+    ds = load_dataset("pinzhenchen/alpaca-cleaned-fr", split="train")
+    count = 0
+    for row in ds:
+        instruction = (row.get("instruction") or "").strip()
+        inp = (row.get("input") or "").strip()
+        output = (row.get("output") or "").strip()
+        user_text = f"{instruction}\n{inp}".strip() if inp else instruction
+        if user_text and _assistant_ok(output):
+            yield user_text, output, None
+            count += 1
+            if count >= max_rows:
+                break
+def _iter_cidar(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
+    from datasets import load_dataset
+    ds = load_dataset("arbml/CIDAR", split="train")
+    count = 0
+    for row in ds:
+        instruction = (row.get("instruction") or "").strip()
+        inp = (row.get("input") or "").strip()
+        output = (row.get("output") or "").strip()
+        user_text = f"{instruction}\n{inp}".strip() if inp else instruction
+        topic = instruction[:80] if instruction else None
+        if user_text and _assistant_ok(output):
+            yield user_text, output, topic
+            count += 1
+            if count >= max_rows:
+                break
+def _iter_instar(max_rows: int) -> Iterator[tuple[str, str, str | None]]:
+    from datasets import load_dataset
+    ds = load_dataset("ClusterlabAi/InstAr-500k", split="train", streaming=True)
+    count = 0
+    for row in ds:
+        task = row.get("task") or ""
+        if task not in _INSTAR_GOOD_TASKS:
+            continue
+        instruction = (row.get("instruction") or "").strip()
+        output = (row.get("output") or "").strip()
+        topic = (row.get("topic") or "").strip() or None
+        if instruction and _assistant_ok(output):
+            yield instruction, output, topic
+            count += 1
+            if count >= max_rows:
+                break
+_SOURCE_LOADERS: dict[str, dict[str, Any]] = {
+    "angeluriot/french_instruct": {"fr": _iter_french_instruct},
+    "CohereLabs/aya_dataset": {
+        "fr": lambda n: _iter_aya("fra", n),
+        "ar": lambda n: _iter_aya("arb", n),
+    },
+    "pinzhenchen/alpaca-cleaned-fr": {"fr": _iter_alpaca_fr},
+    "arbml/CIDAR": {"ar": _iter_cidar},
+    "ClusterlabAi/InstAr-500k": {"ar": _iter_instar},
+}
+def _collect_from_source(
+    source: str,
+    language: str,
+    max_rows: int,
+    rng: random.Random,
+) -> list[dict[str, Any]]:
+    loaders = _SOURCE_LOADERS.get(source, {})
+    loader = loaders.get(language)
+    if loader is None:
+        print(f"  skip {source} (no loader for {language})")
+        return []
+    rows: list[dict[str, Any]] = []
+    try:
+        for user_text, assistant_text, topic in loader(max_rows):
+            mode = _pick_mode(rng, topic=topic)
+            rows.append(
+                _wrap_row(
+                    language=language,
+                    mode=mode,
+                    user_text=user_text,
+                    assistant_text=assistant_text,
+                    topic=topic,
+                )
+            )
+    except Exception as exc:
+        print(f"  warning: {source} failed for {language}: {exc}")
+    return rows
+def _split_eval(
+    rows: list[dict[str, Any]], rng: random.Random
+) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    if len(rows) < 20:
+        return rows, []
+    shuffled = rows.copy()
+    rng.shuffle(shuffled)
+    n_eval = max(1, int(len(shuffled) * EVAL_HOLDOUT_RATIO))
+    return shuffled[n_eval:], shuffled[:n_eval]
+def _write_jsonl(path: Path, rows: list[dict[str, Any]]) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("w", encoding="utf-8") as fh:
+        for row in rows:
+            fh.write(json.dumps(row, ensure_ascii=False) + "\n")
+def build_language_datasets(
+    *,
+    french_sources: tuple[str, ...],
+    arabic_sources: tuple[str, ...],
+    max_per_source: int,
+    seeds_path: Path,
+    skip_hub: bool,
+    seed: int,
+) -> None:
+    rng = random.Random(seed)
+    fr_rows, ar_rows = _load_seeds(seeds_path)
+    print(f"Loaded {len(fr_rows)} FR + {len(ar_rows)} AR seed rows from {seeds_path.name}")
+    if not skip_hub:
+        for source in french_sources:
+            cap = min(max_per_source, SOURCE_CAPS.get(source, {}).get("fr", max_per_source))
+            print(f"Fetching FR from {source} (cap={cap})...")
+            fr_rows.extend(_collect_from_source(source, "fr", cap, rng))
+        for source in arabic_sources:
+            cap = min(max_per_source, SOURCE_CAPS.get(source, {}).get("ar", max_per_source))
+            print(f"Fetching AR from {source} (cap={cap})...")
+            ar_rows.extend(_collect_from_source(source, "ar", cap, rng))
+    fr_train, fr_eval = _split_eval(fr_rows, rng)
+    ar_train, ar_eval = _split_eval(ar_rows, rng)
+    out_fr = _DATA_DIR / "language-lesson-fr.jsonl"
+    out_ar = _DATA_DIR / "language-lesson-ar.jsonl"
+    eval_fr = _DATA_DIR / "language-lesson-eval-fr.jsonl"
+    eval_ar = _DATA_DIR / "language-lesson-eval-ar.jsonl"
+    _write_jsonl(out_fr, fr_train)
+    _write_jsonl(out_ar, ar_train)
+    _write_jsonl(eval_fr, fr_eval)
+    _write_jsonl(eval_ar, ar_eval)
+    print(
+        f"Wrote FR train={len(fr_train)} eval={len(fr_eval)} -> {out_fr.name}, {eval_fr.name}\n"
+        f"Wrote AR train={len(ar_train)} eval={len(ar_eval)} -> {out_ar.name}, {eval_ar.name}"
+    )
+def main() -> None:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--french-sources",
+        default=",".join(DEFAULT_FR_SOURCES),
+        help="Comma-separated Hugging Face dataset ids for French",
+    )
+    parser.add_argument(
+        "--arabic-sources",
+        default=",".join(DEFAULT_AR_SOURCES),
+        help="Comma-separated Hugging Face dataset ids for Arabic",
+    )
+    parser.add_argument("--max-per-source", type=int, default=5000)
+    parser.add_argument(
+        "--custom-seeds",
+        type=Path,
+        default=_DATA_DIR / "language-lesson-seeds.yaml",
+    )
+    parser.add_argument(
+        "--skip-hub",
+        action="store_true",
+        help="Only write seed rows (offline / smoke)",
+    )
+    parser.add_argument("--seed", type=int, default=42)
+    args = parser.parse_args()
+    french_sources = tuple(s.strip() for s in args.french_sources.split(",") if s.strip())
+    arabic_sources = tuple(s.strip() for s in args.arabic_sources.split(",") if s.strip())
+    build_language_datasets(
+        french_sources=french_sources,
+        arabic_sources=arabic_sources,
+        max_per_source=args.max_per_source,
+        seeds_path=args.custom_seeds,
+        skip_hub=args.skip_hub,
+        seed=args.seed,
+    )
+if __name__ == "__main__":
+    main()

research/data/language-lesson-ar.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

research/data/language-lesson-eval-ar.jsonl ADDED Viewed

	@@ -0,0 +1,10 @@

+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: أنشئ قائمة بالصفات التي تصف الشخص الشجاع."}, {"role": "user", "content": "أنشئ قائمة بالصفات التي تصف الشخص الشجاع.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "شجاع، بطل، جريء، قوي، شجاع، حازم، جريء، شجاع، جريء، عنيد، شجاع، قوي القلب، شجاع، شجاع، لا يخشى، شجاع، لا يرتعب، مفعم بالحيوية، مجازف."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: حل مشكلة خدمة العملاء مع عميل غير راضٍ. العميل غير راضٍ بسبب تأخر وصول طلبه."}, {"role": "user", "content": "حل مشكلة خدمة العملاء مع عميل غير راضٍ. العميل غير راضٍ بسبب تأخر وصول طلبه.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "أعتذر عن تأخر وصول طلبكم عن الموعد المتوقع. فريقنا يولي اهتمامًا كبيرًا لرضا العملاء، لذلك أود أن أحقق في هذا الأمر بشكل أكبر. هل يمكنكم تزويدي ببعض التفاصيل الإضافية، مثل رقم الطلب وتاريخ التسليم المتوقع؟ سأعمل على حل هذه المشكلة بأسرع وقت ممكن."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: اذكر ثلاثة اعتراضات يمكن أن يكون لدى العميل بشأن شراء منتج"}, {"role": "user", "content": "اذكر ثلاثة اعتراضات يمكن أن يكون لدى العميل بشأن شراء منتج\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "بعض الاعتراضات المحتملة التي يمكن أن يكون لدى العميل بشأن شراء منتج تشمل التكلفة وميزات المنتج وأوقات التسليم أو الشحن. يمكن أن تكون التكلفة عاملاً إذا شعر العميل بأن المنتج لا يستحق السعر. قد تكون ميزات المنتج مصدر قلق إذا لم يفي المنتج بحاجات أو توقعات العميل. وأخيرًا، قد تكون أوقات التسليم أو الشحن مشكلة إذا كان العميل بحاجة إلى المنتج بسرعة وكان وقت الانتظار طويلاً."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: ما هو إعراب مهما؟"}, {"role": "user", "content": "ما هو إعراب مهما؟\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "مَهْمَا: اسم شرط مبني على السكون، يعرب مبتدأ وخبره جملة الشرط والجواب، وهو يجزم فعلين مضارعين.ومن أمثلته قول الله تعالى: \"وَقالُوا مَهْمَا تَأْتِنَا بِهِ مِنْ آيَةٍ لِتَسْحَرَنا بِهَا فَمَا نَحْنُ لَكَ بِمُؤْمِنِينَ\".وقول زهير بن أبي سلمى:وَمَهْمَا تَكُنْ عِنْدَ امْرِئٍ مِنْ خَلِيقَةٍ * وَإِنْ خالَهَا تَخْفَى عَلَى النَّاسِ تُعْلَمِ"}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: ما إعراب الجملة التي بين القوسين؟ إن (ما قلته الصواب)."}, {"role": "user", "content": "ما إعراب الجملة التي بين القوسين؟ إن (ما قلته الصواب).\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "إنَّ ما قُلْتَهُ الصَّوابُ. ما: اسم موصول مبني على السكون في محل نصب اسم \"إنّ\".قُلْتَهُ: فعل ماض مبني على السكون لاتصاله بضمير رفع متحرك، والتاء ضمير متصل في محل رفع فاعل، والهاء ضمير متصل في محل نصب مفعول به. الصَّوابُ: خبر \"إنّ\" مرفوع بالضمة.ويمكن أن تعرب الجملة الموصولية \"ما قلته\" كلها اسم \"إنّ\"، و\"الصواب\" خبرها."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: أنشئ قائمة من ثلاثة إجراءات يمكن اتخاذها لتقليل كمية النفايات المنتجة."}, {"role": "user", "content": "أنشئ قائمة من ثلاثة إجراءات يمكن اتخاذها لتقليل كمية النفايات المنتجة.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "1. استخدم المنتجات القابلة لإعادة الاستخدام بدلاً من العناصر القابلة للتصرف.\r\n2. ممارسة التسميد.\r\n3. إعادة استخدام وإعادة تدوير المنتجات في كل مرة تكون ذلك ممكنًا."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: طوّر تشبيهًا إبداعيًا يصف التطور."}, {"role": "user", "content": "طوّر تشبيهًا إبداعيًا يصف التطور.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "يمكن تشبيه التطور بلعبة عملاقة من الحجر والورق والمقص. الأنواع تحاول باستمرار تفوق بعضها البعض مع التكيف مع بيئتها المتغيرة باستمرار. يستخدم بعض المخلوقات دروعها القوية للحماية، تمامًا مثل الحجر الذي يتغلب على خصومه. يستخدم آخرون التمويه للبقاء مختبئين والتطور بصمت، مشابهً�� للورقة التي تفوز بالمفاجأة. وما زالت بعض الأنواع تتفوق في الخفة والحركة، وتنجح في التغلب على خصومها بنفس البراعة التي يستخدمها زوج من المقصات."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nFocus topic: ما إعراب جملة\" ما هذا الشيء\""}, {"role": "user", "content": "ما إعراب جملة\" ما هذا الشيء\"\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "ما: اسم استفهام في محل رفع مبتدأ.هذا: اسم إشارة في محل رفع خبر.الشيءُ: اسم مرفوع بدل من هذا."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "What causes seasons on Earth?\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "الفصول تحدث لأن محور الأرض مائل أثناء دورانه حول الشمس. عندما تميل نصف الكرة الشمالي نحو الشمس يكون الصيف، وعندما يبتعد يكون الشتاء. هذا الميل لا يتغيّر طوال السنة، لكن موقعنا بالنسبة للشمس يتغيّر."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: Arabic (ar). Reply ONLY in Arabic. If the student writes or speaks in another language, match their language instead.\n\nLesson topic: اصنع قائمة من 5 مزايا للعيش في الريف"}, {"role": "user", "content": "اصنع قائمة من 5 مزايا للعيش في الريف\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "1. هواء نظيف: العيش في الريف يعني أنك لن تتعرض لتلوث الهواء الناجم عن المركبات وغيرها من المصادر.\r\n2. مناظر خضراء: استمتع بالهواء النقي والمساحات الواسعة والمناظر الجميلة.\r\n3. الإسكان بأسعار معقولة: غالبًا ما يكون الإسكان في الريف أكثر تكلفة من العيش في المدينة.\r\n4. أقل ضغط نفسي: استمتع بوتيرة الحياة الأكثر استرخاءً في الريف.\r\n5. مجتمع أقرب: غالبًا ما تكون المجتمعات الريفية متماسكة، مما يسمح بإقامة علاقات أكثر مع الجيران ذوي المعاني."}]}

research/data/language-lesson-eval-fr.jsonl ADDED Viewed

	@@ -0,0 +1,10 @@

+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Analyser le texte donné pour son ton.\nLe monde a été fortement touché par la pandémie de COVID-19 et il a radicalement changé nos vies.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Le ton du texte est grave et sombre. L'utilisation de termes tels que \"grand impacté\", \"modifié\" et \"pandémie\" suggère la gravité et la gravité de la situation, et véhicule une lourdeur et un souci."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Corriger toute faute d'orthographe ou de grammaire dans la phrase donnée.\nElle a fait du croquis des excersies autour du lac.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Elle a fait du saut d'exercice autour du lac."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Reformuler la phrase suivante au passé.\nElle écrit un roman inspiré de son grand-père.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Elle a écrit un roman inspiré par son grand-père."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a lesson-planning coach for teachers and students.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nor meta commentary about how you will answer.\nHelp outline and explain lesson content verbally: learning goals, key points, and a simple flow.\nIf a lesson topic is set, stay focused on it.\nWhen source excerpts are provided, use them and cite [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Nommez trois fruits riches en vitamine C.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Trois fruits riches en vitamine C sont les oranges, les kiwis et les fraises."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Trouver la racine réelle de l'équation \n sqrt(x) est égal à 10.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "En soustrayant sqrt(x) des deux côtés et en racontant, nous obtenons \n x 2 x x x x x x x x x x x x x x x x.\nPar conséquent, 20 ptr(x) x, donc écailles, 98/20, 49/10. Par conséquent, x (49/10)2 x 2401/100, ou x 24,01."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Quelle est la première règle de la narration ?\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "La première règle de la narration est « Montre, ne le dites pas ». Cela signifie qu'une bonne histoire est une histoire où le public est immergé dans l'expérience à travers des détails, des actions et un dialogue vifs, plutôt que d'être simplement raconté sur les événements. En montrant ce qui se passe, le public peut mieux comprendre les personnages et suivre l'intrigue, ce qui rend l'histoire plus attrayante et mémorable."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Écrivez quelques phrases sur une figure historique donnée.\nAmelia Earhart\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Amelia Earhart a été une aviatrice pionnière et la première femme à voler en solo à travers l'océan Atlantique. Elle a battu de nombreux records et inspiré une génération de pilotes féminins. Sa disparition en 1937 lors d'une tentative de voler autour du monde reste l'un des plus grands mystères non résolus de l'histoire."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Générer une question qui pourrait suivre la réponse ci-dessous.\nJ'ai vu un film la semaine dernière.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Quel film as-tu vu la semaine dernière ?"}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "Regardez le point d'ébullition de l'eau.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Le point d'ébullition de l'eau est de 100 degrés Celsius ou de 212 degrés Fahrenheit à la pression atmosphérique standard."}]}
+{"messages": [{"role": "system", "content": "You are TeacherVoice, a friendly tutor who explains ideas in plain language.\nReply with ONLY the spoken answer (2-5 short sentences). Do not include planning, drafting,\nnumbered outlines, or phrases like \"let me think\" or \"first I need to\".\nUse simple examples when helpful.\nWhen source excerpts are provided, ground your answer in them and cite with [1], [2], etc.\n\nTarget language: French (fr). Reply ONLY in French. If the student writes or speaks in another language, match their language instead."}, {"role": "user", "content": "La variable x est définie comme étant « 4x x 2 ans x 10 ». Trouver la valeur de x.\n\nReply now in 2-4 complete spoken sentences only. No planning, outlines, sentence labels, or meta commentary."}, {"role": "assistant", "content": "Étant donné l'équation 4x x 2y x 10, il y a une infinité de solutions pour la valeur de x, en fonction de la valeur de y. En d'autres termes, la valeur de x dépend de la valeur de y.\n\nCependant, il est encore possible d'exprimer x en termes de y. Pour ce faire, nous pouvons réarranger l'équation pour résoudre x:\n\n4x 2 ans et 10\n4x 10 - 2 ans\nx x (10 - 2 ans)/4\n\nDonc, selon l'équation, la valeur de x est (10 - 2y)/4. Sans connaître la valeur spécifique de y, il n'est pas possible de déterminer la valeur exacte de x."}]}

research/data/language-lesson-fr.jsonl ADDED Viewed

The diff for this file is too large to render. See raw diff

research/data/language-lesson-seeds.yaml ADDED Viewed

	@@ -0,0 +1,75 @@

+# Hand-crafted TeacherVoice examples — exact inference prompt shape.
+# Loaded by build_language_lesson_chat.py (always included; not held out for eval).
+fr:
+  - mode: explain
+    topic: Photosynthèse
+    user: Qu'est-ce que la photosynthèse ?
+    assistant: >
+      La photosynthèse est le processus par lequel les plantes fabriquent leur
+      nourriture grâce à la lumière du soleil. Elles utilisent l'eau, le dioxyde
+      de carbone et l'énergie lumineuse pour produire du sucre et libérer de
+      l'oxygène. C'est ainsi que les forêts nous fournissent l'air que nous respirons.
+  - mode: lesson
+    topic: Les fractions
+    user: Comment introduire les fractions en classe de CM1 ?
+    assistant: >
+      Commence par montrer une pizza coupée en parts égales pour illustrer une moitié
+      et un quart. Demande aux élèves de nommer des fractions autour d'eux, puis
+      propose un exercice où ils comparent deux fractions simples avec un dessin.
+  - mode: explain
+    topic: null
+    user: Explain gravity in simple terms.
+    assistant: >
+      La gravité est la force qui attire les objets les uns vers les autres.
+      Sur Terre, elle nous maintient au sol et fait tomber les pommes des arbres.
+      Plus un objet est massif, plus son attraction gravitationnelle est forte.
+  - mode: lesson
+    topic: Le cycle de l'eau
+    user: Planifie une leçon courte sur le cycle de l'eau pour des élèves de 10 ans.
+    assistant: >
+      Objectif : comprendre évaporation, condensation et précipitation. Montre un
+      schéma simple, puis fais l'expérience de l'eau qui condense sur un verre froid.
+      Termine par un dessin étiqueté que chaque élève ramène à la maison.
+  - mode: explain
+    topic: L'électricité
+    user: Comment fonctionne un circuit électrique simple ?
+    assistant: >
+      Un circuit simple relie une pile, des fils et une ampoule en boucle fermée.
+      Le courant circule de la pile vers l'ampoule, qui transforme l'énergie en lumière.
+      Si tu coupes un fil, la boucle s'ouvre et l'ampoule s'éteint.
+ar:
+  - mode: explain
+    topic: البناء الضوئي
+    user: ما هي عملية البناء الضوئي؟
+    assistant: >
+      البناء الضوئي هو الطريقة التي تصنع بها النباتات غذاءها باستخدام ضوء الشمس.
+      تمتص الأوراق الماء وثاني أكسيد الكربون وتُنتج سكراً وطلق الأكسجين في الهواء.
+      بدون هذه العملية لما كانت الحياة على الأرض ممكنة كما نعرفها اليوم.
+  - mode: lesson
+    topic: الكسور
+    user: كيف أقدّم درساً عن الكسور للصف الخامس؟
+    assistant: >
+      ابدأ برسم دائرة مقسّمة إلى أجزاء متساوية وسمِّ نصفاً وربعاً. اطلب من التلاميذ
+      إيجاد أمثلة من حياتهم اليومية، ثم قارنوا بين كسرين بسيطين باستخدام رسومات.
+  - mode: explain
+    topic: null
+    user: What causes seasons on Earth?
+    assistant: >
+      الفصول تحدث لأن محور الأرض مائل أثناء دورانه حول الشمس. عندما تميل نصف الكرة
+      الشمالي نحو الشمس يكون الصيف، وعندما يبتعد يكون الشتاء. هذا الميل لا يتغيّر
+      طوال السنة، لكن موقعنا بالنسبة للشمس يتغيّر.
+  - mode: lesson
+    topic: دورة الماء
+    user: خطّط درساً قصيراً عن دورة الماء لطلاب في سن العاشرة.
+    assistant: >
+      الهدف أن يفهم التلاميذ التبخّر والتكاثف والهطول. اعرض رسمًا مبسّطاً، ثم
+      جرّب تجربة تكاثف الماء على كوب بارد. اختم برسم مُوسوم يحمله كل تلميذ.
+  - mode: explain
+    topic: الكهرباء
+    user: كيف يعمل دائرة كهربائية بسيطة؟
+    assistant: >
+      الدائرة البسيطة تربط بطارية وأسلاكاً ومصباحاً في حلقة مغلقة. ينتقل التيار
+      من البطارية إلى المصباح فيتحوّل إلى ضوء. إذا قُطع أحد الأسلاك تنكسر الحلقة
+      وينطفئ المصباح فوراً.

research/evals/language_lesson_smoke.py ADDED Viewed

	@@ -0,0 +1,92 @@

+#!/usr/bin/env python3
+"""Smoke-check language-lesson eval JSONL for TeacherVoice format."""
+from __future__ import annotations
+import argparse
+import json
+import re
+import sys
+from pathlib import Path
+_REPO = Path(__file__).resolve().parents[2]
+_DATA = _REPO / "research" / "data"
+_JSON_LEAK = re.compile(r"^\s*[\{\[]|```")
+_ARABIC = re.compile(r"[\u0600-\u06FF]")
+_FRENCH_MARKERS = re.compile(
+    r"[\u00C0-\u024F]|"
+    r"\b(le|la|les|un|une|des|est|sont|pour|dans|avec|que|qui|comment|pourquoi|c'est|ce)\b",
+    re.IGNORECASE,
+)
+_VOICE_SUFFIX = "Reply now in 2-4 complete spoken sentences only"
+def _load_rows(path: Path) -> list[dict]:
+    rows: list[dict] = []
+    with path.open(encoding="utf-8") as fh:
+        for line in fh:
+            line = line.strip()
+            if line:
+                rows.append(json.loads(line))
+    return rows
+def _score_row(row: dict, *, language: str) -> list[str]:
+    issues: list[str] = []
+    messages = row.get("messages") or []
+    if len(messages) < 3:
+        issues.append("missing messages")
+        return issues
+    system = messages[0].get("content", "")
+    user = messages[-2].get("content", "")
+    assistant = messages[-1].get("content", "")
+    if "TeacherVoice" not in system:
+        issues.append("system missing TeacherVoice")
+    label = "French" if language == "fr" else "Arabic"
+    if f"Target language: {label}" not in system:
+        issues.append(f"system missing target language {label}")
+    if _VOICE_SUFFIX not in user:
+        issues.append("user missing voice suffix")
+    if not (40 <= len(assistant) <= 600):
+        issues.append(f"assistant length {len(assistant)} out of range")
+    if _JSON_LEAK.search(assistant):
+        issues.append("assistant looks like JSON/code")
+    if language == "ar" and not _ARABIC.search(assistant):
+        issues.append("assistant missing Arabic script")
+    if language == "fr" and not _FRENCH_MARKERS.search(assistant):
+        issues.append("assistant missing French markers")
+    return issues
+def run_eval(*, language: str) -> int:
+    path = _DATA / f"language-lesson-eval-{language}.jsonl"
+    if not path.is_file():
+        print(f"skip {path.name} (not found)")
+        return 0
+    rows = _load_rows(path)
+    if not rows:
+        print(f"skip {path.name} (empty)")
+        return 0
+    bad = 0
+    for index, row in enumerate(rows):
+        issues = _score_row(row, language=language)
+        if issues:
+            bad += 1
+            print(f"  row {index}: {', '.join(issues)}")
+    ok = len(rows) - bad
+    print(f"{language.upper()} eval: {ok}/{len(rows)} passed")
+    return 0 if bad == 0 else 1
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--language", choices=("fr", "ar", "both"), default="both")
+    args = parser.parse_args()
+    codes = ("fr", "ar") if args.language == "both" else (args.language,)
+    return max(run_eval(language=code) for code in codes)
+if __name__ == "__main__":
+    sys.exit(main())

research/modal/_common.py CHANGED Viewed

@@ -145,9 +145,49 @@ _FINETUNE_FLAGS: dict[str, str] = {
     "lora_dropout": "--lora_dropout",
     "lora_targets": "--lora_targets",
     "val_split": "--val_split",
 }
 def build_finetune_cmd(job: dict[str, Any], out_dir: str) -> list[str]:
     cmd = [
         "uv",
@@ -206,6 +246,13 @@ def build_lm_eval_cmd(
     model_path: str | None = None,
     adapter_path: str | None = None,
     compare_to: str | None = None,
 ) -> list[str]:
     cmd = [
         "uv",
@@ -228,14 +275,58 @@ def build_lm_eval_cmd(
         cmd.extend(["--adapter", adapter_path])
     if compare_to:
         cmd.extend(["--compare-to", compare_to])
     return cmd
 def prepare_jobs(
     *,
     job: str | None = None,
     category: str | None = None,
     max_steps: int | None = None,
 ) -> tuple[dict[str, Any], list[dict[str, Any]]]:
     spec = load_experiments()
     defaults = spec.get("defaults", {})
@@ -251,12 +342,41 @@ def prepare_jobs(
         jobs = [j for j in jobs if j.get("category") == category]
         if not jobs:
             raise SystemExit(f"No jobs with category {category!r}")
     prepared: list[dict[str, Any]] = []
     for raw in jobs:
         merged = apply_defaults(raw, defaults)
         if max_steps is not None:
             merged["max_steps"] = max_steps
         prepared.append(merged)
     return defaults, prepared
@@ -291,7 +411,15 @@ def primary_metric(task_metrics: dict[str, Any]) -> tuple[str, float] | None:
     return None
-def baseline_is_cached(experiment_name: str, config_path: str) -> bool:
     """True if a baseline results.json exists AND its run_meta still matches the
     profile config's tasks/limit/num_fewshot. Config changes (e.g. new guard
     tasks or a higher limit) therefore correctly force a fresh baseline."""
@@ -309,11 +437,20 @@ def baseline_is_cached(experiment_name: str, config_path: str) -> bool:
         cfg = yaml.safe_load(cfg_file.read_text()) or {}
     except Exception:
         return False
-    return (
-        sorted(meta.get("tasks") or []) == sorted(cfg.get("tasks") or [])
-        and meta.get("limit") == cfg.get("limit")
-        and meta.get("num_fewshot") == cfg.get("num_fewshot", 0)
     )
 def evaluate_gate(

     "lora_dropout": "--lora_dropout",
     "lora_targets": "--lora_targets",
     "val_split": "--val_split",
+    "device": "--device",
 }
+def split_csv(value: str | None) -> list[str] | None:
+    if not value:
+        return None
+    items = [item.strip() for item in value.split(",") if item.strip()]
+    return items or None
+def parse_json_object(value: str | None, *, flag: str) -> dict[str, Any]:
+    if not value:
+        return {}
+    try:
+        parsed = json.loads(value)
+    except json.JSONDecodeError as exc:
+        raise SystemExit(f"{flag} must be a JSON object: {exc}") from exc
+    if not isinstance(parsed, dict):
+        raise SystemExit(f"{flag} must be a JSON object")
+    return parsed
+def job_plan_rows(jobs: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    """Compact, printable description of selected jobs and their eval profile."""
+    rows = []
+    for job in jobs:
+        rows.append(
+            {
+                "name": job.get("name"),
+                "category": job.get("category"),
+                "usecase": job.get("usecase") or job.get("use_case"),
+                "profile": job.get("eval_profile", "compare_study"),
+                "dataset": "mix" if job.get("mix") else job.get("dataset"),
+                "mode": job.get("mode", "lora"),
+                "max_steps": job.get("max_steps"),
+                "max_samples": job.get("max_samples"),
+                "publish": bool(job.get("publish")),
+            }
+        )
+    return rows
 def build_finetune_cmd(job: dict[str, Any], out_dir: str) -> list[str]:
     cmd = [
         "uv",
     model_path: str | None = None,
     adapter_path: str | None = None,
     compare_to: str | None = None,
+    tasks: list[str] | None = None,
+    limit: int | None = None,
+    num_fewshot: int | None = None,
+    batch_size: str | None = None,
+    device: str | None = None,
+    dtype: str | None = None,
+    seed: int | None = None,
 ) -> list[str]:
     cmd = [
         "uv",
         cmd.extend(["--adapter", adapter_path])
     if compare_to:
         cmd.extend(["--compare-to", compare_to])
+    if tasks:
+        cmd.append("--tasks")
+        cmd.extend(tasks)
+    if limit is not None:
+        cmd.extend(["--limit", str(int(limit))])
+    if num_fewshot is not None:
+        cmd.extend(["--num-fewshot", str(int(num_fewshot))])
+    if batch_size:
+        cmd.extend(["--batch-size", str(batch_size)])
+    if device:
+        cmd.extend(["--device", str(device)])
+    if dtype:
+        cmd.extend(["--dtype", str(dtype)])
+    if seed is not None:
+        cmd.extend(["--seed", str(int(seed))])
     return cmd
+def _matches_job_filters(
+    job: dict[str, Any],
+    *,
+    sector: str | None = None,
+    usecase: str | None = None,
+    profiles: list[str] | None = None,
+) -> bool:
+    if sector and job.get("sector", job.get("category")) != sector:
+        return False
+    if usecase:
+        values = {
+            job.get("usecase"),
+            job.get("use_case"),
+            job.get("category"),
+            job.get("name"),
+        }
+        values.update(job.get("tags") or [])
+        if usecase not in values:
+            return False
+    if profiles and job.get("eval_profile", "compare_study") not in profiles:
+        return False
+    return True
 def prepare_jobs(
     *,
     job: str | None = None,
     category: str | None = None,
+    sector: str | None = None,
+    usecase: str | None = None,
+    profiles: list[str] | None = None,
     max_steps: int | None = None,
+    max_samples: int | None = None,
+    finetune_overrides: dict[str, Any] | None = None,
 ) -> tuple[dict[str, Any], list[dict[str, Any]]]:
     spec = load_experiments()
     defaults = spec.get("defaults", {})
         jobs = [j for j in jobs if j.get("category") == category]
         if not jobs:
             raise SystemExit(f"No jobs with category {category!r}")
+    if sector or usecase or profiles:
+        jobs = [
+            j
+            for j in jobs
+            if _matches_job_filters(
+                j,
+                sector=sector,
+                usecase=usecase,
+                profiles=profiles,
+            )
+        ]
+        if not jobs:
+            filters = {
+                "sector": sector,
+                "usecase": usecase,
+                "profiles": profiles,
+            }
+            raise SystemExit(f"No jobs matched filters: {filters}")
     prepared: list[dict[str, Any]] = []
     for raw in jobs:
         merged = apply_defaults(raw, defaults)
         if max_steps is not None:
             merged["max_steps"] = max_steps
+        if max_samples is not None:
+            merged["max_samples"] = max_samples
+        if finetune_overrides:
+            args = {**(merged.get("args") or {})}
+            for key, value in finetune_overrides.items():
+                if key in _FINETUNE_FLAGS:
+                    args[key] = value
+                else:
+                    merged[key] = value
+            if args:
+                merged["args"] = args
         prepared.append(merged)
     return defaults, prepared
     return None
+def baseline_is_cached(
+    experiment_name: str,
+    config_path: str,
+    *,
+    tasks: list[str] | None = None,
+    limit: int | None = None,
+    num_fewshot: int | None = None,
+    seed: int | None = None,
+) -> bool:
     """True if a baseline results.json exists AND its run_meta still matches the
     profile config's tasks/limit/num_fewshot. Config changes (e.g. new guard
     tasks or a higher limit) therefore correctly force a fresh baseline."""
         cfg = yaml.safe_load(cfg_file.read_text()) or {}
     except Exception:
         return False
+    expected_tasks = tasks or cfg.get("tasks") or []
+    expected_limit = limit if limit is not None else cfg.get("limit")
+    expected_fewshot = (
+        num_fewshot if num_fewshot is not None else cfg.get("num_fewshot", 0)
+    )
+    expected_seed = seed if seed is not None else cfg.get("seed")
+    same = (
+        sorted(meta.get("tasks") or []) == sorted(expected_tasks)
+        and meta.get("limit") == expected_limit
+        and meta.get("num_fewshot") == expected_fewshot
     )
+    if expected_seed is not None:
+        same = same and meta.get("seed") == expected_seed
+    return same
 def evaluate_gate(

research/modal/experiments.yaml CHANGED Viewed

@@ -190,3 +190,44 @@ finetune:
     max_samples: 200
     description: General instruction tuning baseline (Hub, local-only)
     eval_profile: instructions

     max_samples: 200
     description: General instruction tuning baseline (Hub, local-only)
     eval_profile: instructions
+  # --- language lessons: FR/AR TeacherVoice coach (Cohere-free stack) ---
+  - name: language-lesson-lora
+    category: language
+    max_steps: 200
+    mix:
+      - dataset: research/data/language-lesson-fr.jsonl
+        format: chat
+        weight: 12
+      - dataset: research/data/language-lesson-ar.jsonl
+        format: chat
+        weight: 12
+      - dataset: research/data/science-tutor-chat.jsonl
+        format: chat
+        weight: 4
+      - dataset: tatsu-lab/alpaca
+        format: alpaca
+        dataset_split: "train[:400]"
+        max_samples: 400
+        weight: 1
+    args:
+      lora_r: 32
+      lora_alpha: 64
+      neftune_noise_alpha: 5
+      early_stopping_patience: 2
+      val_split: 0.05
+    description: >
+      FR/AR TeacherVoice LoRA from language-lesson-fr/ar.jsonl (Hub-built via
+      build_language_lesson_chat.py) + English replay
+    eval_profile: understanding
+    goals:
+      task: boolq
+      min_improve: 0.0
+      guard_tasks:
+        - task: hellaswag
+          max_regress: 0.03
+    publish:
+      hub_repo: MSGEncrypted/minicpm5-1b-language-lesson-lora
+      mirror_repos:
+        - build-small-hackathon/minicpm5-1b-language-lesson-lora
+      private: false

research/modal/finetune_app.py CHANGED Viewed

@@ -35,7 +35,7 @@ for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")
     if _candidate.is_dir() and str(_candidate) not in sys.path:
         sys.path.insert(0, str(_candidate))
-from _common import (
     BASE_MODEL_ID,
     FINETUNE_VOL_PATH,
     HF_CACHE_PATH,
@@ -50,8 +50,10 @@ from _common import (
     hf_secret,
     image,
     job_gpu,
-    load_experiments,
     prepare_jobs,
     publish_adapter_files,
     pull_artifacts,
     reload_volumes,
@@ -107,6 +109,13 @@ def run_lm_eval(
     model_path: str | None = None,
     adapter_path: str | None = None,
     compare_to: str | None = None,
 ) -> dict[str, Any]:
     """Run slm-lm-eval on base model or finetuned checkpoint."""
     reload_volumes()
@@ -128,6 +137,13 @@ def run_lm_eval(
         model_path=model_path,
         adapter_path=adapter_path,
         compare_to=compare_to,
     )
     print("Running:", " ".join(cmd))
     proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
@@ -146,6 +162,13 @@ def run_lm_eval(
         "model_path": model_path,
         "adapter_path": adapter_path,
         "compare_to": compare_to,
         "results_json": str(results_json),
         "summary_md": str(summary_md),
         "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
@@ -213,9 +236,23 @@ def main(
     parallel: bool = False,
     job: str | None = None,
     category: str | None = None,
     max_steps: int | None = None,
     publish: bool = True,
     pull: bool = True,
 ):
     """
     Skill-matrix pipeline: per-profile baselines -> train -> eval -> gate -> publish -> pull.
@@ -227,21 +264,43 @@ def main(
         modal run research/modal/finetune_app.py --eval-only --job math-lora
         modal run research/modal/finetune_app.py --no-publish --no-pull
     """
-    defaults, prepared = prepare_jobs(job=job, category=category, max_steps=max_steps)
     if not prepared:
         raise SystemExit("No matching jobs; check --job/--category and experiments.yaml")
     preset = defaults.get("preset", "minicpm5-1b")
-    profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
     baselines_ok: dict[str, bool] = {}
-    if not eval_only:
-        print(f"--- baselines ({', '.join(profiles)}) ---")
-        for profile in profiles:
             result = run_lm_eval.remote(
                 experiment_name=f"{preset}__baseline__{profile}",
                 config=config_for_profile(profile),
                 preset=preset,
             )
             print(json.dumps(result, indent=2))
             baselines_ok[profile] = bool(result.get("ok"))
@@ -284,6 +343,13 @@ def main(
             model_path=BASE_MODEL_ID,
             adapter_path=adapter_path,
             compare_to=compare_to,
         )
         print(json.dumps(eval_result, indent=2))
@@ -291,6 +357,7 @@ def main(
             "name": job_name,
             "category": j.get("category"),
             "profile": profile,
         }
         gate_result: dict[str, Any] | None = None

     if _candidate.is_dir() and str(_candidate) not in sys.path:
         sys.path.insert(0, str(_candidate))
+from _common import (  # noqa: E402
     BASE_MODEL_ID,
     FINETUNE_VOL_PATH,
     HF_CACHE_PATH,
     hf_secret,
     image,
     job_gpu,
+    job_plan_rows,
+    parse_json_object,
     prepare_jobs,
+    split_csv,
     publish_adapter_files,
     pull_artifacts,
     reload_volumes,
     model_path: str | None = None,
     adapter_path: str | None = None,
     compare_to: str | None = None,
+    tasks: list[str] | None = None,
+    limit: int | None = None,
+    num_fewshot: int | None = None,
+    batch_size: str | None = None,
+    device: str | None = None,
+    dtype: str | None = None,
+    seed: int | None = None,
 ) -> dict[str, Any]:
     """Run slm-lm-eval on base model or finetuned checkpoint."""
     reload_volumes()
         model_path=model_path,
         adapter_path=adapter_path,
         compare_to=compare_to,
+        tasks=tasks,
+        limit=limit,
+        num_fewshot=num_fewshot,
+        batch_size=batch_size,
+        device=device,
+        dtype=dtype,
+        seed=seed,
     )
     print("Running:", " ".join(cmd))
     proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
         "model_path": model_path,
         "adapter_path": adapter_path,
         "compare_to": compare_to,
+        "tasks": tasks,
+        "limit": limit,
+        "num_fewshot": num_fewshot,
+        "batch_size": batch_size,
+        "device": device,
+        "dtype": dtype,
+        "seed": seed,
         "results_json": str(results_json),
         "summary_md": str(summary_md),
         "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
     parallel: bool = False,
     job: str | None = None,
     category: str | None = None,
+    sector: str | None = None,
+    usecase: str | None = None,
+    profiles: str | None = None,
     max_steps: int | None = None,
+    max_samples: int | None = None,
+    finetune_args_json: str | None = None,
     publish: bool = True,
     pull: bool = True,
+    plan: bool = False,
+    skip_baseline: bool = False,
+    eval_tasks: str | None = None,
+    eval_limit: int | None = None,
+    eval_num_fewshot: int | None = None,
+    eval_batch_size: str | None = None,
+    eval_device: str | None = None,
+    eval_dtype: str | None = None,
+    eval_seed: int | None = None,
 ):
     """
     Skill-matrix pipeline: per-profile baselines -> train -> eval -> gate -> publish -> pull.
         modal run research/modal/finetune_app.py --eval-only --job math-lora
         modal run research/modal/finetune_app.py --no-publish --no-pull
     """
+    defaults, prepared = prepare_jobs(
+        job=job,
+        category=category,
+        sector=sector,
+        usecase=usecase,
+        profiles=split_csv(profiles),
+        max_steps=max_steps,
+        max_samples=max_samples,
+        finetune_overrides=parse_json_object(
+            finetune_args_json, flag="--finetune-args-json"
+        ),
+    )
     if not prepared:
         raise SystemExit("No matching jobs; check --job/--category and experiments.yaml")
     preset = defaults.get("preset", "minicpm5-1b")
+    plan_rows = job_plan_rows(prepared)
+    if plan:
+        print(json.dumps({"preset": preset, "jobs": plan_rows}, indent=2))
+        return
+    profile_names = sorted({j.get("eval_profile", "compare_study") for j in prepared})
     baselines_ok: dict[str, bool] = {}
+    if not eval_only and not skip_baseline:
+        print(f"--- baselines ({', '.join(profile_names)}) ---")
+        for profile in profile_names:
             result = run_lm_eval.remote(
                 experiment_name=f"{preset}__baseline__{profile}",
                 config=config_for_profile(profile),
                 preset=preset,
+                tasks=split_csv(eval_tasks),
+                limit=eval_limit,
+                num_fewshot=eval_num_fewshot,
+                batch_size=eval_batch_size,
+                device=eval_device,
+                dtype=eval_dtype,
+                seed=eval_seed,
             )
             print(json.dumps(result, indent=2))
             baselines_ok[profile] = bool(result.get("ok"))
             model_path=BASE_MODEL_ID,
             adapter_path=adapter_path,
             compare_to=compare_to,
+            tasks=split_csv(eval_tasks),
+            limit=eval_limit,
+            num_fewshot=eval_num_fewshot,
+            batch_size=eval_batch_size,
+            device=eval_device,
+            dtype=eval_dtype,
+            seed=eval_seed,
         )
         print(json.dumps(eval_result, indent=2))
             "name": job_name,
             "category": j.get("category"),
             "profile": profile,
+            "plan": next((p for p in plan_rows if p["name"] == job_name), None),
         }
         gate_result: dict[str, Any] | None = None

research/modal/server_app.py CHANGED Viewed

@@ -43,7 +43,7 @@ for _candidate in (Path(__file__).resolve().parent, Path("/repo/research/modal")
     if _candidate.is_dir() and str(_candidate) not in sys.path:
         sys.path.insert(0, str(_candidate))
-from _common import (
     BASE_MODEL_ID,
     DEFAULT_GPU,
     DEFAULT_KEEPALIVE_HOURS,
@@ -52,7 +52,6 @@ from _common import (
     FINETUNE_VOL_PATH,
     HF_CACHE_PATH,
     LM_EVAL_OUTPUT,
-    apply_defaults,
     baseline_is_cached,
     build_finetune_cmd,
     build_lm_eval_cmd,
@@ -63,8 +62,10 @@ from _common import (
     hf_cache_vol,
     hf_secret,
     image,
-    load_experiments,
     prepare_jobs,
     publish_adapter_files,
     pull_artifacts,
     reload_volumes,
@@ -165,6 +166,13 @@ class GpuWorker:
         model_path: str | None = None,
         adapter_path: str | None = None,
         compare_to: str | None = None,
     ) -> dict[str, Any]:
         """Run slm-lm-eval on base model or finetuned checkpoint."""
         # Pick up adapters committed by another container (e.g. a separate
@@ -187,6 +195,13 @@ class GpuWorker:
             model_path=model_path,
             adapter_path=adapter_path,
             compare_to=compare_to,
         )
         print("Running:", " ".join(cmd))
         proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
@@ -205,6 +220,13 @@ class GpuWorker:
             "model_path": model_path,
             "adapter_path": adapter_path,
             "compare_to": compare_to,
             "results_json": str(results_json),
             "summary_md": str(summary_md),
             "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
@@ -252,43 +274,63 @@ class GpuWorker:
         *,
         job_names: list[str] | None = None,
         category: str | None = None,
         max_steps: int | None = None,
         train: bool = True,
         eval_only: bool = False,
         publish: bool = True,
     ) -> dict[str, Any]:
         """Per-profile baselines -> finetune -> eval -> gate -> publish (same container)."""
-        spec = load_experiments()
-        defaults = spec.get("defaults", {})
-        jobs = spec.get("finetune", [])
         if job_names:
-            jobs = [j for j in jobs if j.get("name") in job_names]
-            if not jobs:
                 raise ValueError(f"No matching jobs in experiments.yaml: {job_names}")
-        if category:
-            jobs = [j for j in jobs if j.get("category") == category]
-            if not jobs:
-                raise ValueError(f"No jobs with category {category!r}")
-        if not jobs:
-            raise ValueError("No jobs matched job_names/category")
         preset = defaults.get("preset", "minicpm5-1b")
-        prepared: list[dict[str, Any]] = []
-        for raw in jobs:
-            merged = apply_defaults(raw, defaults)
-            if max_steps is not None:
-                merged["max_steps"] = max_steps
-            prepared.append(merged)
-        profiles = sorted({j.get("eval_profile", "compare_study") for j in prepared})
         baselines_ok: dict[str, bool] = {}
-        if not eval_only:
-            for profile in profiles:
                 exp = f"{preset}__baseline__{profile}"
                 cfg_path = config_for_profile(profile)
-                if baseline_is_cached(exp, cfg_path):
                     print(f"baseline {exp}: reusing cached results (config unchanged)")
                     baselines_ok[profile] = True
                     continue
@@ -296,6 +338,13 @@ class GpuWorker:
                     experiment_name=exp,
                     config=cfg_path,
                     preset=preset,
                 )
                 baselines_ok[profile] = bool(result.get("ok"))
@@ -325,12 +374,20 @@ class GpuWorker:
                 model_path=BASE_MODEL_ID,
                 adapter_path=adapter_path,
                 compare_to=compare_to,
             )
             row: dict[str, Any] = {
                 "name": job_name,
                 "category": j.get("category"),
                 "profile": profile,
                 "eval": eval_result,
             }
@@ -374,13 +431,27 @@ def main(
     cmd: str | None = None,
     job: str | None = None,
     category: str | None = None,
     max_steps: int | None = None,
     eval_only: bool = False,
     pipeline: bool = False,
     publish: bool = True,
     publish_only: bool = False,
     pull: bool = True,
     ping: bool = False,
 ):
     """
     GPU worker CLI.
@@ -395,11 +466,25 @@ def main(
         modal run research/modal/server_app.py
         modal run research/modal/server_app.py --pipeline --job math-lora --max-steps 20
         modal run research/modal/server_app.py --pipeline --category science --no-publish
         modal run research/modal/server_app.py --eval-only --job math-lora
         modal run research/modal/server_app.py --publish-only --job math-lora
         modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
     """
-    has_task = bool(cmd or job or category or eval_only or pipeline or publish_only or ping)
     if has_task:
         serve = False
@@ -451,18 +536,37 @@ def main(
         print(json.dumps(result, indent=2))
         return
-    if pipeline or job or category or eval_only:
         job_names = [job] if job else None
         result = worker.run_pipeline.remote(
             job_names=job_names,
             category=category,
             max_steps=max_steps,
             train=not eval_only,
             eval_only=eval_only,
             publish=publish,
         )
         print(json.dumps(result, indent=2))
         if pull:
             for row in result.get("jobs", []):
                 pull_artifacts(row["name"], f"{row['name']}__{row['profile']}")

     if _candidate.is_dir() and str(_candidate) not in sys.path:
         sys.path.insert(0, str(_candidate))
+from _common import (  # noqa: E402
     BASE_MODEL_ID,
     DEFAULT_GPU,
     DEFAULT_KEEPALIVE_HOURS,
     FINETUNE_VOL_PATH,
     HF_CACHE_PATH,
     LM_EVAL_OUTPUT,
     baseline_is_cached,
     build_finetune_cmd,
     build_lm_eval_cmd,
     hf_cache_vol,
     hf_secret,
     image,
+    job_plan_rows,
+    parse_json_object,
     prepare_jobs,
+    split_csv,
     publish_adapter_files,
     pull_artifacts,
     reload_volumes,
         model_path: str | None = None,
         adapter_path: str | None = None,
         compare_to: str | None = None,
+        tasks: list[str] | None = None,
+        limit: int | None = None,
+        num_fewshot: int | None = None,
+        batch_size: str | None = None,
+        device: str | None = None,
+        dtype: str | None = None,
+        seed: int | None = None,
     ) -> dict[str, Any]:
         """Run slm-lm-eval on base model or finetuned checkpoint."""
         # Pick up adapters committed by another container (e.g. a separate
             model_path=model_path,
             adapter_path=adapter_path,
             compare_to=compare_to,
+            tasks=tasks,
+            limit=limit,
+            num_fewshot=num_fewshot,
+            batch_size=batch_size,
+            device=device,
+            dtype=dtype,
+            seed=seed,
         )
         print("Running:", " ".join(cmd))
         proc = subprocess.run(cmd, cwd="/repo", check=False, env=repo_env())
             "model_path": model_path,
             "adapter_path": adapter_path,
             "compare_to": compare_to,
+            "tasks": tasks,
+            "limit": limit,
+            "num_fewshot": num_fewshot,
+            "batch_size": batch_size,
+            "device": device,
+            "dtype": dtype,
+            "seed": seed,
             "results_json": str(results_json),
             "summary_md": str(summary_md),
             "comparison_md": str(comparison_md) if comparison_md.is_file() else None,
         *,
         job_names: list[str] | None = None,
         category: str | None = None,
+        sector: str | None = None,
+        usecase: str | None = None,
+        profiles: list[str] | None = None,
         max_steps: int | None = None,
+        max_samples: int | None = None,
+        finetune_overrides: dict[str, Any] | None = None,
         train: bool = True,
         eval_only: bool = False,
+        eval_tasks: list[str] | None = None,
+        eval_limit: int | None = None,
+        eval_num_fewshot: int | None = None,
+        eval_batch_size: str | None = None,
+        eval_device: str | None = None,
+        eval_dtype: str | None = None,
+        eval_seed: int | None = None,
+        skip_baseline: bool = False,
         publish: bool = True,
+        plan_only: bool = False,
     ) -> dict[str, Any]:
         """Per-profile baselines -> finetune -> eval -> gate -> publish (same container)."""
+        defaults, prepared = prepare_jobs(
+            job=None,
+            category=category,
+            sector=sector,
+            usecase=usecase,
+            profiles=profiles,
+            max_steps=max_steps,
+            max_samples=max_samples,
+            finetune_overrides=finetune_overrides,
+        )
         if job_names:
+            wanted = set(job_names)
+            prepared = [j for j in prepared if j.get("name") in wanted]
+            if not prepared:
                 raise ValueError(f"No matching jobs in experiments.yaml: {job_names}")
+        if not prepared:
+            raise ValueError("No jobs matched the requested filters")
         preset = defaults.get("preset", "minicpm5-1b")
+        profile_names = sorted({j.get("eval_profile", "compare_study") for j in prepared})
+        plan = job_plan_rows(prepared)
+        if plan_only:
+            return {"preset": preset, "jobs": plan}
         baselines_ok: dict[str, bool] = {}
+        if not eval_only and not skip_baseline:
+            for profile in profile_names:
                 exp = f"{preset}__baseline__{profile}"
                 cfg_path = config_for_profile(profile)
+                if baseline_is_cached(
+                    exp,
+                    cfg_path,
+                    tasks=eval_tasks,
+                    limit=eval_limit,
+                    num_fewshot=eval_num_fewshot,
+                    seed=eval_seed,
+                ):
                     print(f"baseline {exp}: reusing cached results (config unchanged)")
                     baselines_ok[profile] = True
                     continue
                     experiment_name=exp,
                     config=cfg_path,
                     preset=preset,
+                    tasks=eval_tasks,
+                    limit=eval_limit,
+                    num_fewshot=eval_num_fewshot,
+                    batch_size=eval_batch_size,
+                    device=eval_device,
+                    dtype=eval_dtype,
+                    seed=eval_seed,
                 )
                 baselines_ok[profile] = bool(result.get("ok"))
                 model_path=BASE_MODEL_ID,
                 adapter_path=adapter_path,
                 compare_to=compare_to,
+                tasks=eval_tasks,
+                limit=eval_limit,
+                num_fewshot=eval_num_fewshot,
+                batch_size=eval_batch_size,
+                device=eval_device,
+                dtype=eval_dtype,
+                seed=eval_seed,
             )
             row: dict[str, Any] = {
                 "name": job_name,
                 "category": j.get("category"),
                 "profile": profile,
+                "plan": next((p for p in plan if p["name"] == job_name), None),
                 "eval": eval_result,
             }
     cmd: str | None = None,
     job: str | None = None,
     category: str | None = None,
+    sector: str | None = None,
+    usecase: str | None = None,
+    profiles: str | None = None,
     max_steps: int | None = None,
+    max_samples: int | None = None,
+    finetune_args_json: str | None = None,
     eval_only: bool = False,
     pipeline: bool = False,
     publish: bool = True,
     publish_only: bool = False,
     pull: bool = True,
     ping: bool = False,
+    plan: bool = False,
+    skip_baseline: bool = False,
+    eval_tasks: str | None = None,
+    eval_limit: int | None = None,
+    eval_num_fewshot: int | None = None,
+    eval_batch_size: str | None = None,
+    eval_device: str | None = None,
+    eval_dtype: str | None = None,
+    eval_seed: int | None = None,
 ):
     """
     GPU worker CLI.
         modal run research/modal/server_app.py
         modal run research/modal/server_app.py --pipeline --job math-lora --max-steps 20
         modal run research/modal/server_app.py --pipeline --category science --no-publish
+        modal run research/modal/server_app.py --pipeline --sector science --eval-limit 25
+        modal run research/modal/server_app.py --plan --profiles math,science
         modal run research/modal/server_app.py --eval-only --job math-lora
         modal run research/modal/server_app.py --publish-only --job math-lora
         modal run research/modal/server_app.py --cmd "uv run python research/finetune.py --help"
     """
+    has_task = bool(
+        cmd
+        or job
+        or category
+        or sector
+        or usecase
+        or profiles
+        or eval_only
+        or pipeline
+        or publish_only
+        or ping
+        or plan
+    )
     if has_task:
         serve = False
         print(json.dumps(result, indent=2))
         return
+    if pipeline or job or category or sector or usecase or profiles or eval_only or plan:
         job_names = [job] if job else None
         result = worker.run_pipeline.remote(
             job_names=job_names,
             category=category,
+            sector=sector,
+            usecase=usecase,
+            profiles=split_csv(profiles),
             max_steps=max_steps,
+            max_samples=max_samples,
+            finetune_overrides=parse_json_object(
+                finetune_args_json, flag="--finetune-args-json"
+            ),
             train=not eval_only,
             eval_only=eval_only,
+            eval_tasks=split_csv(eval_tasks),
+            eval_limit=eval_limit,
+            eval_num_fewshot=eval_num_fewshot,
+            eval_batch_size=eval_batch_size,
+            eval_device=eval_device,
+            eval_dtype=eval_dtype,
+            eval_seed=eval_seed,
+            skip_baseline=skip_baseline,
             publish=publish,
+            plan_only=plan,
         )
         print(json.dumps(result, indent=2))
+        if plan:
+            return
         if pull:
             for row in result.get("jobs", []):
                 pull_artifacts(row["name"], f"{row['name']}__{row['profile']}")

research/modal/tests/test_modal_common.py ADDED Viewed

	@@ -0,0 +1,76 @@

+from pathlib import Path
+import sys
+sys.path.insert(0, str(Path(__file__).resolve().parents[3]))
+from research.modal._common import (  # noqa: E402
+    build_finetune_cmd,
+    build_lm_eval_cmd,
+    prepare_jobs,
+    split_csv,
+)
+def test_build_lm_eval_cmd_accepts_runtime_overrides():
+    cmd = build_lm_eval_cmd(
+        experiment_name="exp",
+        config="cfg.yaml",
+        preset="minicpm5-1b",
+        tasks=["arc_easy", "hellaswag"],
+        limit=5,
+        num_fewshot=1,
+        batch_size="2",
+        device="cuda",
+        dtype="float16",
+        seed=7,
+    )
+    assert cmd[-15:] == [
+        "--tasks",
+        "arc_easy",
+        "hellaswag",
+        "--limit",
+        "5",
+        "--num-fewshot",
+        "1",
+        "--batch-size",
+        "2",
+        "--device",
+        "cuda",
+        "--dtype",
+        "float16",
+        "--seed",
+        "7",
+    ]
+def test_prepare_jobs_filters_and_applies_finetune_overrides():
+    _, jobs = prepare_jobs(
+        sector="math",
+        profiles=["math"],
+        max_steps=3,
+        max_samples=11,
+        finetune_overrides={"lr": 1e-4, "lora_r": 8, "dataset_split": "train[:11]"},
+    )
+    assert [job["name"] for job in jobs] == ["math-lora"]
+    job = jobs[0]
+    assert job["max_steps"] == 3
+    assert job["max_samples"] == 11
+    assert job["dataset_split"] == "train[:11]"
+    assert job["args"]["lr"] == 1e-4
+    assert job["args"]["lora_r"] == 8
+    cmd = build_finetune_cmd(job, "/tmp/out")
+    assert "--max_steps" in cmd
+    assert cmd[cmd.index("--max_steps") + 1] == "3"
+    assert "--lr" in cmd
+    assert cmd[cmd.index("--lr") + 1] == "0.0001"
+    assert "--lora_r" in cmd
+    assert cmd[cmd.index("--lora_r") + 1] == "8"
+def test_split_csv_trims_empty_values():
+    assert split_csv(" math, science ,,code ") == ["math", "science", "code"]
+    assert split_csv(None) is None

voice_models.yaml CHANGED Viewed

@@ -2,12 +2,13 @@
 # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
 defaults:
-  asr_preset: cohere-transcribe
   tts_preset: piper-multilingual
   # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
   realtime_tts_preset: vibevoice-realtime-0.5b
-  coach_model: tiny-aya-global
   coach_fallbacks:
     - minicpm5-1b
   max_seconds: 30

 # Override defaults via ECHOCOACH_ASR_PRESET / ECHOCOACH_TTS_PRESET in .env
 defaults:
+  asr_preset: whisper-cpp-base
   tts_preset: piper-multilingual
   # Realtime streaming TTS for TeacherVoice VoiceOut (set ECHOCOACH_TTS_PRESET to match)
   realtime_tts_preset: vibevoice-realtime-0.5b
+  coach_model: minicpm5-1b-language-lesson-hub
   coach_fallbacks:
+    - minicpm5-1b-language-lesson-lora
     - minicpm5-1b
   max_seconds: 30