Spaces:

MSGEncrypted
/

lesson-agent-dev

Sleeping

App Files Files Community

lesson-agent-dev / USAGE.md

msg encrypted ai

Feat/sprint last 2hours (#22)

aac5f23 10 days ago

preview code

Raw

History Blame Contribute Delete

18.6 kB

	# Usage

	How to run the Lesson Agent Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon).

	The primary UI is the Lesson slides tab (topic → local model outline → downloadable `.pptx`). Use ResearchMind for corpus Q&A, Language lessons for multilingual text + voice tutoring (OpenBMB + Whisper by default), EchoCoach for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The Chat (debug) tab tests the underlying model.

	## Prerequisites

	- [uv](https://docs.astral.sh/uv/) installed
	- Python 3.12 (see `.python-version`)
	- For Docker testing: Docker installed locally
	- For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org

	## Local development

	### 1. Install dependencies

	```bash
	uv sync --all-packages
	```

	### 2. Configure environment (optional)

	```bash
	cp .env.example .env
	```

	Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers).

	### 3. Pre-download the model (optional for GGUF presets)

	If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use:

	```bash
	uv run python scripts/download_model.py
	```

	Then add the printed path to `.env`:

	```bash
	MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
	```

	### 4. Run the Gradio app

	```bash
	uv run --package gradio-space python -m gradio_space.app
	```

	Open [http://localhost:7860](http://localhost:7860).

	\| URL \| UI \|
	\|-----\|-----\|
	\| `/` \| Studio — custom HTML/CSS/JS workspace (Off Brand entry) \|
	\| `/classic` \| Classic — full Gradio tabs, settings, Chat (debug) \|

	The header in Classic includes a link back to Studio UI.

	The model loads on the first Generate (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates.

	### Switching models locally (transformers ↔ llama.cpp)

	For local dev you can switch presets at runtime without restarting:

	```bash
	# .env
	ALLOW_MODEL_SWITCH=true
	ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers)
	```

	\| UI \| Where to switch \|
	\|----\|-----------------\|
	\| Classic (`/classic`) \| Settings accordion → Model preset dropdown (reloads on change) \|
	\| Classic Chat tab \| Model preset dropdown (syncs app-wide) \|
	\| Studio (`/`) \| Settings drawer → Model preset; Debug tab has the same list \|

	\| Goal \| Preset key \|
	\|------\|------------\|
	\| MiniCPM-V 4.6 transformers (full VLM) \| `minicpm-v-4.6` \|
	\| MiniCPM-V 4.6 llama.cpp / Llama Champion \| `minicpm-v-4.6-gguf` \|
	\| MiniCPM5 1B text \| `minicpm5-1b` \|
	\| Lesson LoRA (transformers only) \| `minicpm5-1b-lesson-lora` \|

	Prefetch the GGUF weights (optional):

	```bash
	uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf
	```

	On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`.

	### Lesson slides — research sources

	The Lesson slides tab can ground outlines on external sources before building the deck:

	\| Source mode \| What it does \|
	\| ----------- \| ------------ \|
	\| None (model only) \| Default — outline from the local model only \|
	\| Web search \| Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides \|
	\| RAG (indexed sources) \| Use a ResearchMind session and/or URLs/files you provide on this tab \|

	When Web search is selected, choose a search workflow:

	\| Workflow \| Steps \|
	\| -------- \| ----- \|
	\| Two-step search (suggest & confirm) \| Click Discover sources → select URLs → Generate lesson slides \|
	\| Auto search & ingest \| Click Generate lesson slides only — search, ingest, and outline in one step \|

	RAG mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.

	Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).

	Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`).

	### EchoCoach — voice practice

	The EchoCoach tab records up to 30 seconds, then runs a local pipeline:

	Getting audio in

	- Record from this computer — click Start recording, speak, then Stop recording (uses PipeWire `pw-record` when available). The slider is a max-length safety cap.
	- Browser Record — needs mic permission and a secure context; open http://localhost:7860 (not `0.0.0.0` or a LAN IP).
	- Upload — drop a `.wav` or `.mp3` file (works everywhere, including HF Space).

	If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`).

	Pipeline steps:

	1. ASR — Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
	2. Analysis — filler highlights, pace score, matplotlib charts
	3. Coach — rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`)
	4. VoiceOut — Piper TTS speaks the summary (or full rewrite if checked)

	Install optional extras:

	```bash
	# Whisper.cpp fallback ASR (CPU)
	uv sync --package echocoach --extra whisper

	# Piper VoiceOut TTS
	uv sync --package echocoach --extra piper
	python -m piper.download_voices en_US-lessac-medium
	```

	Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`:

	\| Variable \| Default \| Description \|
	\| -------- \| ------- \| ----------- \|
	\| `ECHOCOACH_ASR_PRESET` \| `whisper-cpp-base` \| ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo \|
	\| `ECHOCOACH_TTS_PRESET` \| `piper-multilingual` \| TTS preset key (EchoCoach, default VoiceOut) \|
	\| `ECHOCOACH_REALTIME_TTS_PRESET` \| `vibevoice-realtime-0.5b` \| Language lessons streaming TTS (see below) \|
	\| `ECHOCOACH_COACH_MODEL` \| `minicpm5-1b-language-lesson-hub` \| Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) \|
	\| `ECHOCOACH_COACH_FALLBACK` \| `minicpm5-1b` \| Comma-separated fallback presets if primary coach fails to load \|
	\| `ECHOCOACH_MAX_SECONDS` \| `30` \| Max recording length \|

	Cohere Transcribe (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together.

	Smoke tests (analysis only, no GPU):

	```bash
	bash scripts/echo_coach_smoke.sh
	```

	### Language lessons — multilingual coach (Studio tab)

	The Language lessons tab is the primary voice learning experience: one page for text, hold-to-talk mic, and audio upload, with optional auto VoiceOut on every reply.

	\| Input \| Output \|
	\| ----- \| ------ \|
	\| Type a question \| Chat bubble in target language \|
	\| Hold mic / upload audio \| Transcript + teacher reply; auto-play TTS when enabled \|
	\| Other (text only) language code \| Written lesson via coach prompts (no Piper voice for unsupported codes) \|

	Default stack (Cohere-free): [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR → [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) → Piper or VibeVoice Realtime for speech out.

	Rebuild training JSONL from Hugging Face sources:

	```bash
	uv run python research/data/build_language_lesson_chat.py
	modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
	```

	Optional Cohere Labs partner demo: [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global).

	Default `.env` / Space secrets:

	```bash
	ECHOCOACH_ASR_PRESET=whisper-cpp-base
	ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
	ECHOCOACH_COACH_FALLBACK=minicpm5-1b
	ECHOCOACH_TTS_PRESET=piper-multilingual
	ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
	```

	\| Mode \| Purpose \|
	\| ---- \| ------- \|
	\| Explain \| Tutor any topic in plain language \|
	\| Lesson coach \| Discuss and outline lesson content \|

	Turn-based (not full duplex): speak → wait → hear reply. Auto-speak replies synthesizes TTS each turn when the language has a Piper voice.

	Pitch metrics and monologue analysis live in Classic UI → EchoCoach (`/classic`).

	### TeacherVoice — Classic UI (turn-based)

	The TeacherVoice tab in `/classic` is the legacy multi-turn voice teacher — same pipeline as Language lessons, plus Pitch practice mode.

	\| Mode \| Purpose \|
	\| ---- \| ------- \|
	\| Explain \| Tutor any topic in plain language \|
	\| Lesson coach \| Discuss and outline lesson content verbally \|
	\| Pitch practice \| Short live speaking tips each turn \|

	EchoCoach vs TeacherVoice

	\| \| EchoCoach \| TeacherVoice \|
	\| --- \| --- \| --- \|
	\| Interaction \| One-shot after Analyze pitch \| Multi-turn Send turn \|
	\| Best for \| Pace/filler charts, JSON rewrite report \| Q&A, lesson discussion, conversational pitch tips \|
	\| TTS \| One VoiceOut clip per analysis \| Voice reply every turn (first sentence plays quickly when Piper is installed) \|
	\| RAG \| No \| Optional ResearchMind grounding (Explain / Lesson) \|

	Flow per turn: record up to 15s → ASR → text LLM with chat history → Piper TTS (auto-plays when installed).

	After each reply, use Speak last reply or Speak first sentence to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).

	Install Piper for voice output (included in `gradio-space` deps after `uv sync`):

	```bash
	uv sync
	python -m piper.download_voices en_US-lessac-medium
	```

	Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. Restart the Gradio app after installing Piper so the Speak buttons can synthesize audio.

	Realtime TTS (VibeVoice) — [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.

	Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).

	Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`.

	Optional omni profile (GPU, experimental — falls back to ASR+LLM+Piper):

	```bash
	ECHOCOACH_VOICE_PROFILE=omni
	ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
	```

	Unit tests (no GPU):

	```bash
	uv run pytest libs/echocoach/tests/test_teacher_voice.py -q
	```

	### 5. Upload agent trace (Sharing is Caring badge)

	```bash
	uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces
	```

	### 5. Quick sanity checks

	```bash
	# Inference package resolves
	uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"

	# Gradio app module loads
	uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
	```

	### Local env reference


	\| Variable \| Default \| Description \|
	\| ------------------- \| --------------------------------- \| ------------------------------------------ \|
	\| `INFERENCE_BACKEND` \| `llama_cpp` \| `llama_cpp` or `transformers` \|
	\| `MODEL_REPO` \| `Qwen/Qwen2.5-3B-Instruct-GGUF` \| Hub repo for GGUF \|
	\| `MODEL_FILE` \| `qwen2.5-3b-instruct-q4_k_m.gguf` \| GGUF filename \|
	\| `MODEL_PATH` \| — \| Local GGUF path (skips Hub download) \|
	\| `N_CTX` \| `4096` \| Context window \|
	\| `N_GPU_LAYERS` \| `0` \| GPU layers for llama.cpp (`0` = CPU only) \|
	\| `PORT` \| `7860` \| Gradio listen port \|
	\| `MODEL_ID` \| `Qwen/Qwen2.5-3B-Instruct` \| Used when `INFERENCE_BACKEND=transformers` \|


	### Optional: transformers backend

	Heavier install; only needed if you switch away from llama.cpp:

	```bash
	uv sync --package inference --extra transformers
	INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
	uv run --package gradio-space python -m gradio_space.app
	```

	---

	## Gradio SDK local smoke test (matches HF Space build)

	Before pushing to Hugging Face, verify the Gradio SDK entry point:

	```bash
	python -m venv .venv-gradio && source .venv-gradio/bin/activate
	pip install -r requirements.txt
	ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py
	```

	Open [http://localhost:7860](http://localhost:7860) — Studio at `/`, Classic at `/classic`.

	Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`.

	---

	## Hugging Face Space deployment (Gradio SDK + ZeroGPU)

	The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`).

	### 1. Push code to GitHub

	Make sure `main` contains at minimum:

	- `app.py`, `requirements.txt`, `packages.txt`
	- `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`)
	- `models.yaml`, `skills/`
	- `apps/gradio-space/` and all `libs/*` packages

	The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below).

	### 2. Create the Space

	1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon)
	2. New Space
	3. Name: e.g. `lesson-agent` or `small-model-hackathon`
	4. SDK: Gradio (Blank template)
	5. Hardware: ZeroGPU (creator needs PRO/Team) or GPU basic
	6. Link your GitHub repo, or push directly to the Space git remote

	CLI alternative (if you have `hf` installed and org access):

	```bash
	hf repo create build-small-hackathon/<your-space-name> \
	--repo-type space \
	--space_sdk gradio
	```

	### 3. Set Space environment variables

	In the Space Settings → Variables and secrets:

	\| Variable \| Value \|
	\| -------- \| ----- \|
	\| `ACTIVE_MODEL` \| `minicpm5-1b` \|
	\| `ALLOW_MODEL_SWITCH` \| `false` \|
	\| `RESEARCHMIND_DATA_DIR` \| `/tmp/researchmind` \|

	Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) — suitable for ZeroGPU.

	### 4. Build and verify

	HF installs from `requirements.txt` and runs root `app.py`. Check the Logs tab for:

	- Successful pip install (first build may take several minutes — `llama-cpp-python` compiles)
	- `Running on local URL: 0.0.0.0:7860`

	Smoke test on the live Space:

	1. `/` — Studio UI loads
	2. `/classic` — all tabs render
	3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
	4. First LLM request may be slow (model download + ZeroGPU queue)

	### 5. ZeroGPU notes

	LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see No CUDA GPUs are available, an inference path is running outside a decorated handler.

	Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.

	### 6. Optional: persistent model cache

	Attach a Storage Bucket in Space settings so Hub model weights survive restarts.

	---

	## Docker SDK deployment (later)

	Both deploy paths live on the same branch. HF reads one `sdk:` from README — switch to Docker when you are ready for a dedicated-GPU Space.

	1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`)
	2. Create or reconfigure a Space with Docker SDK and GPU basic hardware
	3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.)

	### Local Docker smoke test

	```bash
	docker build -t hackathon-space .
	docker run --rm -p 7860:7860 \
	-e ACTIVE_MODEL=minicpm5-1b \
	-e ALLOW_MODEL_SWITCH=false \
	-e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
	hackathon-space
	```

	Open [http://localhost:7860](http://localhost:7860) — Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`.

	To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`:

	```bash
	docker run --rm -p 7860:7860 \
	-v "$(pwd)/models:/app/models:ro" \
	-e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
	hackathon-space
	```

	---

	## Troubleshooting


	\| Symptom \| Likely cause \| Fix \|
	\| ---------------------------------------- \| --------------------------------- \| -------------------------------------------------------------------- \|
	\| First chat hangs / slow \| Model downloading from Hub \| Wait on Space; use Storage Bucket for cache \|
	\| `Failed to load model` in chat \| Wrong `ACTIVE_MODEL` preset \| Use `minicpm5-1b` or valid key from `models.yaml` \|
	\| Space build fails on pip install \| `llama-cpp-python` compile \| Check Logs; default preset avoids GGUF at runtime \|
	\| Space build fails \| Malformed README YAML \| Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter \|
	\| No CUDA GPUs on ZeroGPU \| Handler outside `@spaces.GPU` \| LLM entry points must use `gpu_task` in `spaces_runtime.py` \|
	\| Docker build fails on `llama-cpp-python` \| Missing build tools \| Dockerfile installs `build-essential` and `cmake` \|
	\| Port already in use locally \| Another process on 7860 \| `PORT=7861 python app.py` or `uv run ...` \|


	---

	## Entrypoint summary

	\| Environment \| How to run \|
	\| ----------- \| ---------- \|
	\| Local dev (uv) \| `uv run --package gradio-space python -m gradio_space.app` \|
	\| Local Gradio SDK smoke \| `pip install -r requirements.txt && python app.py` \|
	\| HF Gradio Space \| HF runs root `app.py` automatically \|
	\| Docker (later) \| `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) \|