lesson-agent-dev / USAGE.md
msg encrypted ai
Feat/sprint last 2hours (#22)
aac5f23
|
Raw
History Blame Contribute Delete
18.6 kB
# Usage
How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).
The primary UI is the **Lesson slides** tab (topic β†’ local model outline β†’ downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model.
## Prerequisites
- [uv](https://docs.astral.sh/uv/) installed
- Python 3.12 (see `.python-version`)
- For Docker testing: Docker installed locally
- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org
## Local development
### 1. Install dependencies
```bash
uv sync --all-packages
```
### 2. Configure environment (optional)
```bash
cp .env.example .env
```
Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers).
### 3. Pre-download the model (optional for GGUF presets)
If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use:
```bash
uv run python scripts/download_model.py
```
Then add the printed path to `.env`:
```bash
MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
```
### 4. Run the Gradio app
```bash
uv run --package gradio-space python -m gradio_space.app
```
Open [http://localhost:7860](http://localhost:7860).
| URL | UI |
|-----|-----|
| `/` | **Studio** β€” custom HTML/CSS/JS workspace (Off Brand entry) |
| `/classic` | **Classic** β€” full Gradio tabs, settings, Chat (debug) |
The header in Classic includes a link back to Studio UI.
The model loads on the **first Generate** (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates.
### Switching models locally (transformers ↔ llama.cpp)
For local dev you can switch presets at runtime without restarting:
```bash
# .env
ALLOW_MODEL_SWITCH=true
ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers)
```
| UI | Where to switch |
|----|-----------------|
| **Classic** (`/classic`) | **Settings** accordion β†’ Model preset dropdown (reloads on change) |
| **Classic** Chat tab | Model preset dropdown (syncs app-wide) |
| **Studio** (`/`) | Settings drawer β†’ Model preset; Debug tab has the same list |
| Goal | Preset key |
|------|------------|
| MiniCPM-V 4.6 transformers (full VLM) | `minicpm-v-4.6` |
| MiniCPM-V 4.6 llama.cpp / Llama Champion | `minicpm-v-4.6-gguf` |
| MiniCPM5 1B text | `minicpm5-1b` |
| Lesson LoRA (transformers only) | `minicpm5-1b-lesson-lora` |
Prefetch the GGUF weights (optional):
```bash
uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf
```
On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`.
### Lesson slides β€” research sources
The **Lesson slides** tab can ground outlines on external sources before building the deck:
| Source mode | What it does |
| ----------- | ------------ |
| **None (model only)** | Default β€” outline from the local model only |
| **Web search** | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides |
| **RAG (indexed sources)** | Use a **ResearchMind session** and/or URLs/files you provide on this tab |
When **Web search** is selected, choose a **search workflow**:
| Workflow | Steps |
| -------- | ----- |
| **Two-step search (suggest & confirm)** | Click **Discover sources** β†’ select URLs β†’ **Generate lesson slides** |
| **Auto search & ingest** | Click **Generate lesson slides** only β€” search, ingest, and outline in one step |
**RAG** mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.
Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).
Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).
### EchoCoach β€” voice practice
The **EchoCoach** tab records up to 30 seconds, then runs a local pipeline:
**Getting audio in**
- **Record from this computer** β€” click **Start recording**, speak, then **Stop recording** (uses PipeWire `pw-record` when available). The slider is a max-length safety cap.
- **Browser Record** β€” needs mic permission and a secure context; open **http://localhost:7860** (not `0.0.0.0` or a LAN IP).
- **Upload** β€” drop a `.wav` or `.mp3` file (works everywhere, including HF Space).
If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`).
Pipeline steps:
1. **ASR** β€” Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
2. **Analysis** β€” filler highlights, pace score, matplotlib charts
3. **Coach** β€” rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`)
4. **VoiceOut** β€” Piper TTS speaks the summary (or full rewrite if checked)
Install optional extras:
```bash
# Whisper.cpp fallback ASR (CPU)
uv sync --package echocoach --extra whisper
# Piper VoiceOut TTS
uv sync --package echocoach --extra piper
python -m piper.download_voices en_US-lessac-medium
```
Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:
| Variable | Default | Description |
| -------- | ------- | ----------- |
| `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo |
| `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) |
| `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) |
| `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) |
| `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load |
| `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length |
**Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face β€” run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.
Smoke tests (analysis only, no GPU):
```bash
bash scripts/echo_coach_smoke.sh
```
### Language lessons β€” multilingual coach (Studio tab)
The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply.
| Input | Output |
| ----- | ------ |
| Type a question | Chat bubble in target language |
| Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
| **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |
**Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR β†’ [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) β†’ Piper or VibeVoice Realtime for speech out.
Rebuild training JSONL from Hugging Face sources:
```bash
uv run python research/data/build_language_lesson_chat.py
modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
```
Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).
Default `.env` / Space secrets:
```bash
ECHOCOACH_ASR_PRESET=whisper-cpp-base
ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
ECHOCOACH_COACH_FALLBACK=minicpm5-1b
ECHOCOACH_TTS_PRESET=piper-multilingual
ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
```
| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content |
Turn-based (not full duplex): speak β†’ wait β†’ hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice.
Pitch metrics and monologue analysis live in **Classic UI β†’ EchoCoach** (`/classic`).
### TeacherVoice β€” Classic UI (turn-based)
The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher β€” same pipeline as Language lessons, plus **Pitch practice** mode.
| Mode | Purpose |
| ---- | ------- |
| **Explain** | Tutor any topic in plain language |
| **Lesson coach** | Discuss and outline lesson content verbally |
| **Pitch practice** | Short live speaking tips each turn |
**EchoCoach vs TeacherVoice**
| | EchoCoach | TeacherVoice |
| --- | --- | --- |
| Interaction | One-shot after **Analyze pitch** | Multi-turn **Send turn** |
| Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips |
| TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) |
| RAG | No | Optional ResearchMind grounding (Explain / Lesson) |
**Flow per turn:** record up to **15s** β†’ ASR β†’ text LLM with chat history β†’ Piper TTS (auto-plays when installed).
After each reply, use **Speak last reply** or **Speak first sentence** to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).
Install Piper for voice output (included in `gradio-space` deps after `uv sync`):
```bash
uv sync
python -m piper.download_voices en_US-lessac-medium
```
Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. **Restart the Gradio app** after installing Piper so the Speak buttons can synthesize audio.
**Realtime TTS (VibeVoice)** β€” [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.
Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).
Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`.
Optional omni profile (GPU, experimental β€” falls back to ASR+LLM+Piper):
```bash
ECHOCOACH_VOICE_PROFILE=omni
ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
```
Unit tests (no GPU):
```bash
uv run pytest libs/echocoach/tests/test_teacher_voice.py -q
```
### 5. Upload agent trace (Sharing is Caring badge)
```bash
uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces
```
### 5. Quick sanity checks
```bash
# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"
# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
```
### Local env reference
| Variable | Default | Description |
| ------------------- | --------------------------------- | ------------------------------------------ |
| `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` |
| `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF |
| `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename |
| `MODEL_PATH` | β€” | Local GGUF path (skips Hub download) |
| `N_CTX` | `4096` | Context window |
| `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) |
| `PORT` | `7860` | Gradio listen port |
| `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` |
### Optional: transformers backend
Heavier install; only needed if you switch away from llama.cpp:
```bash
uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
uv run --package gradio-space python -m gradio_space.app
```
---
## Gradio SDK local smoke test (matches HF Space build)
Before pushing to Hugging Face, verify the Gradio SDK entry point:
```bash
python -m venv .venv-gradio && source .venv-gradio/bin/activate
pip install -r requirements.txt
ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py
```
Open [http://localhost:7860](http://localhost:7860) β€” Studio at `/`, Classic at `/classic`.
Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`.
---
## Hugging Face Space deployment (Gradio SDK + ZeroGPU)
The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`).
### 1. Push code to GitHub
Make sure `main` contains at minimum:
- `app.py`, `requirements.txt`, `packages.txt`
- `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`)
- `models.yaml`, `skills/`
- `apps/gradio-space/` and all `libs/*` packages
The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below).
### 2. Create the Space
1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
2. **New Space**
3. Name: e.g. `lesson-agent` or `small-model-hackathon`
4. SDK: **Gradio** (Blank template)
5. Hardware: **ZeroGPU** (creator needs PRO/Team) or **GPU basic**
6. Link your GitHub repo, or push directly to the Space git remote
CLI alternative (if you have `hf` installed and org access):
```bash
hf repo create build-small-hackathon/<your-space-name> \
--repo-type space \
--space_sdk gradio
```
### 3. Set Space environment variables
In the Space **Settings β†’ Variables and secrets**:
| Variable | Value |
| -------- | ----- |
| `ACTIVE_MODEL` | `minicpm5-1b` |
| `ALLOW_MODEL_SWITCH` | `false` |
| `RESEARCHMIND_DATA_DIR` | `/tmp/researchmind` |
Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) β€” suitable for ZeroGPU.
### 4. Build and verify
HF installs from `requirements.txt` and runs root `app.py`. Check the **Logs** tab for:
- Successful pip install (first build may take several minutes β€” `llama-cpp-python` compiles)
- `Running on local URL: 0.0.0.0:7860`
Smoke test on the live Space:
1. **`/`** β€” Studio UI loads
2. **`/classic`** β€” all tabs render
3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
4. First LLM request may be slow (model download + ZeroGPU queue)
### 5. ZeroGPU notes
LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see **No CUDA GPUs are available**, an inference path is running outside a decorated handler.
Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.
### 6. Optional: persistent model cache
Attach a **Storage Bucket** in Space settings so Hub model weights survive restarts.
---
## Docker SDK deployment (later)
Both deploy paths live on the same branch. HF reads **one** `sdk:` from README β€” switch to Docker when you are ready for a dedicated-GPU Space.
1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`)
2. Create or reconfigure a Space with **Docker** SDK and **GPU basic** hardware
3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.)
### Local Docker smoke test
```bash
docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
-e ACTIVE_MODEL=minicpm5-1b \
-e ALLOW_MODEL_SWITCH=false \
-e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
hackathon-space
```
Open [http://localhost:7860](http://localhost:7860) β€” Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`.
To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`:
```bash
docker run --rm -p 7860:7860 \
-v "$(pwd)/models:/app/models:ro" \
-e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
hackathon-space
```
---
## Troubleshooting
| Symptom | Likely cause | Fix |
| ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- |
| First chat hangs / slow | Model downloading from Hub | Wait on Space; use Storage Bucket for cache |
| `Failed to load model` in chat | Wrong `ACTIVE_MODEL` preset | Use `minicpm5-1b` or valid key from `models.yaml` |
| Space build fails on pip install | `llama-cpp-python` compile | Check Logs; default preset avoids GGUF at runtime |
| Space build fails | Malformed README YAML | Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter |
| No CUDA GPUs on ZeroGPU | Handler outside `@spaces.GPU` | LLM entry points must use `gpu_task` in `spaces_runtime.py` |
| Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile installs `build-essential` and `cmake` |
| Port already in use locally | Another process on 7860 | `PORT=7861 python app.py` or `uv run ...` |
---
## Entrypoint summary
| Environment | How to run |
| ----------- | ---------- |
| Local dev (uv) | `uv run --package gradio-space python -m gradio_space.app` |
| Local Gradio SDK smoke | `pip install -r requirements.txt && python app.py` |
| HF Gradio Space | HF runs root `app.py` automatically |
| Docker (later) | `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) |