# Usage How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). The primary UI is the **Lesson slides** tab (topic → local model outline → downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model. ## Prerequisites - [uv](https://docs.astral.sh/uv/) installed - Python 3.12 (see `.python-version`) - For Docker testing: Docker installed locally - For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org ## Local development ### 1. Install dependencies ```bash uv sync --all-packages ``` ### 2. Configure environment (optional) ```bash cp .env.example .env ``` Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers). ### 3. Pre-download the model (optional for GGUF presets) If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use: ```bash uv run python scripts/download_model.py ``` Then add the printed path to `.env`: ```bash MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf ``` ### 4. Run the Gradio app ```bash uv run --package gradio-space python -m gradio_space.app ``` Open [http://localhost:7860](http://localhost:7860). | URL | UI | |-----|-----| | `/` | **Studio** — custom HTML/CSS/JS workspace (Off Brand entry) | | `/classic` | **Classic** — full Gradio tabs, settings, Chat (debug) | The header in Classic includes a link back to Studio UI. The model loads on the **first Generate** (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates. ### Switching models locally (transformers ↔ llama.cpp) For local dev you can switch presets at runtime without restarting: ```bash # .env ALLOW_MODEL_SWITCH=true ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers) ``` | UI | Where to switch | |----|-----------------| | **Classic** (`/classic`) | **Settings** accordion → Model preset dropdown (reloads on change) | | **Classic** Chat tab | Model preset dropdown (syncs app-wide) | | **Studio** (`/`) | Settings drawer → Model preset; Debug tab has the same list | | Goal | Preset key | |------|------------| | MiniCPM-V 4.6 transformers (full VLM) | `minicpm-v-4.6` | | MiniCPM-V 4.6 llama.cpp / Llama Champion | `minicpm-v-4.6-gguf` | | MiniCPM5 1B text | `minicpm5-1b` | | Lesson LoRA (transformers only) | `minicpm5-1b-lesson-lora` | Prefetch the GGUF weights (optional): ```bash uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf ``` On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`. ### Lesson slides — research sources The **Lesson slides** tab can ground outlines on external sources before building the deck: | Source mode | What it does | | ----------- | ------------ | | **None (model only)** | Default — outline from the local model only | | **Web search** | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides | | **RAG (indexed sources)** | Use a **ResearchMind session** and/or URLs/files you provide on this tab | When **Web search** is selected, choose a **search workflow**: | Workflow | Steps | | -------- | ----- | | **Two-step search (suggest & confirm)** | Click **Discover sources** → select URLs → **Generate lesson slides** | | **Auto search & ingest** | Click **Generate lesson slides** only — search, ingest, and outline in one step | **RAG** mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step. Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`). Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`). ### EchoCoach — voice practice The **EchoCoach** tab records up to 30 seconds, then runs a local pipeline: **Getting audio in** - **Record from this computer** — click **Start recording**, speak, then **Stop recording** (uses PipeWire `pw-record` when available). The slider is a max-length safety cap. - **Browser Record** — needs mic permission and a secure context; open **http://localhost:7860** (not `0.0.0.0` or a LAN IP). - **Upload** — drop a `.wav` or `.mp3` file (works everywhere, including HF Space). If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`). Pipeline steps: 1. **ASR** — Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base 2. **Analysis** — filler highlights, pace score, matplotlib charts 3. **Coach** — rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`) 4. **VoiceOut** — Piper TTS speaks the summary (or full rewrite if checked) Install optional extras: ```bash # Whisper.cpp fallback ASR (CPU) uv sync --package echocoach --extra whisper # Piper VoiceOut TTS uv sync --package echocoach --extra piper python -m piper.download_voices en_US-lessac-medium ``` Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`: | Variable | Default | Description | | -------- | ------- | ----------- | | `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo | | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) | | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) | | `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) | | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load | | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length | **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face — run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together. Smoke tests (analysis only, no GPU): ```bash bash scripts/echo_coach_smoke.sh ``` ### Language lessons — multilingual coach (Studio tab) The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply. | Input | Output | | ----- | ------ | | Type a question | Chat bubble in target language | | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled | | **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) | **Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR → [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) → Piper or VibeVoice Realtime for speech out. Rebuild training JSONL from Hugging Face sources: ```bash uv run python research/data/build_language_lesson_chat.py modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish ``` Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global). Default `.env` / Space secrets: ```bash ECHOCOACH_ASR_PRESET=whisper-cpp-base ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub ECHOCOACH_COACH_FALLBACK=minicpm5-1b ECHOCOACH_TTS_PRESET=piper-multilingual ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b ``` | Mode | Purpose | | ---- | ------- | | **Explain** | Tutor any topic in plain language | | **Lesson coach** | Discuss and outline lesson content | Turn-based (not full duplex): speak → wait → hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice. Pitch metrics and monologue analysis live in **Classic UI → EchoCoach** (`/classic`). ### TeacherVoice — Classic UI (turn-based) The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher — same pipeline as Language lessons, plus **Pitch practice** mode. | Mode | Purpose | | ---- | ------- | | **Explain** | Tutor any topic in plain language | | **Lesson coach** | Discuss and outline lesson content verbally | | **Pitch practice** | Short live speaking tips each turn | **EchoCoach vs TeacherVoice** | | EchoCoach | TeacherVoice | | --- | --- | --- | | Interaction | One-shot after **Analyze pitch** | Multi-turn **Send turn** | | Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips | | TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) | | RAG | No | Optional ResearchMind grounding (Explain / Lesson) | **Flow per turn:** record up to **15s** → ASR → text LLM with chat history → Piper TTS (auto-plays when installed). After each reply, use **Speak last reply** or **Speak first sentence** to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped). Install Piper for voice output (included in `gradio-space` deps after `uv sync`): ```bash uv sync python -m piper.download_voices en_US-lessac-medium ``` Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. **Restart the Gradio app** after installing Piper so the Speak buttons can synthesize audio. **Realtime TTS (VibeVoice)** — [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card. Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug). Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`. Optional omni profile (GPU, experimental — falls back to ASR+LLM+Piper): ```bash ECHOCOACH_VOICE_PROFILE=omni ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5 ``` Unit tests (no GPU): ```bash uv run pytest libs/echocoach/tests/test_teacher_voice.py -q ``` ### 5. Upload agent trace (Sharing is Caring badge) ```bash uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces ``` ### 5. Quick sanity checks ```bash # Inference package resolves uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)" # Gradio app module loads uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())" ``` ### Local env reference | Variable | Default | Description | | ------------------- | --------------------------------- | ------------------------------------------ | | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` | | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF | | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename | | `MODEL_PATH` | — | Local GGUF path (skips Hub download) | | `N_CTX` | `4096` | Context window | | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) | | `PORT` | `7860` | Gradio listen port | | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` | ### Optional: transformers backend Heavier install; only needed if you switch away from llama.cpp: ```bash uv sync --package inference --extra transformers INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \ uv run --package gradio-space python -m gradio_space.app ``` --- ## Gradio SDK local smoke test (matches HF Space build) Before pushing to Hugging Face, verify the Gradio SDK entry point: ```bash python -m venv .venv-gradio && source .venv-gradio/bin/activate pip install -r requirements.txt ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py ``` Open [http://localhost:7860](http://localhost:7860) — Studio at `/`, Classic at `/classic`. Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`. --- ## Hugging Face Space deployment (Gradio SDK + ZeroGPU) The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`). ### 1. Push code to GitHub Make sure `main` contains at minimum: - `app.py`, `requirements.txt`, `packages.txt` - `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`) - `models.yaml`, `skills/` - `apps/gradio-space/` and all `libs/*` packages The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below). ### 2. Create the Space 1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon) 2. **New Space** 3. Name: e.g. `lesson-agent` or `small-model-hackathon` 4. SDK: **Gradio** (Blank template) 5. Hardware: **ZeroGPU** (creator needs PRO/Team) or **GPU basic** 6. Link your GitHub repo, or push directly to the Space git remote CLI alternative (if you have `hf` installed and org access): ```bash hf repo create build-small-hackathon/ \ --repo-type space \ --space_sdk gradio ``` ### 3. Set Space environment variables In the Space **Settings → Variables and secrets**: | Variable | Value | | -------- | ----- | | `ACTIVE_MODEL` | `minicpm5-1b` | | `ALLOW_MODEL_SWITCH` | `false` | | `RESEARCHMIND_DATA_DIR` | `/tmp/researchmind` | Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) — suitable for ZeroGPU. ### 4. Build and verify HF installs from `requirements.txt` and runs root `app.py`. Check the **Logs** tab for: - Successful pip install (first build may take several minutes — `llama-cpp-python` compiles) - `Running on local URL: 0.0.0.0:7860` Smoke test on the live Space: 1. **`/`** — Studio UI loads 2. **`/classic`** — all tabs render 3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides") 4. First LLM request may be slow (model download + ZeroGPU queue) ### 5. ZeroGPU notes LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see **No CUDA GPUs are available**, an inference path is running outside a decorated handler. Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task. ### 6. Optional: persistent model cache Attach a **Storage Bucket** in Space settings so Hub model weights survive restarts. --- ## Docker SDK deployment (later) Both deploy paths live on the same branch. HF reads **one** `sdk:` from README — switch to Docker when you are ready for a dedicated-GPU Space. 1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`) 2. Create or reconfigure a Space with **Docker** SDK and **GPU basic** hardware 3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.) ### Local Docker smoke test ```bash docker build -t hackathon-space . docker run --rm -p 7860:7860 \ -e ACTIVE_MODEL=minicpm5-1b \ -e ALLOW_MODEL_SWITCH=false \ -e RESEARCHMIND_DATA_DIR=/tmp/researchmind \ hackathon-space ``` Open [http://localhost:7860](http://localhost:7860) — Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`. To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`: ```bash docker run --rm -p 7860:7860 \ -v "$(pwd)/models:/app/models:ro" \ -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \ hackathon-space ``` --- ## Troubleshooting | Symptom | Likely cause | Fix | | ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- | | First chat hangs / slow | Model downloading from Hub | Wait on Space; use Storage Bucket for cache | | `Failed to load model` in chat | Wrong `ACTIVE_MODEL` preset | Use `minicpm5-1b` or valid key from `models.yaml` | | Space build fails on pip install | `llama-cpp-python` compile | Check Logs; default preset avoids GGUF at runtime | | Space build fails | Malformed README YAML | Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter | | No CUDA GPUs on ZeroGPU | Handler outside `@spaces.GPU` | LLM entry points must use `gpu_task` in `spaces_runtime.py` | | Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile installs `build-essential` and `cmake` | | Port already in use locally | Another process on 7860 | `PORT=7861 python app.py` or `uv run ...` | --- ## Entrypoint summary | Environment | How to run | | ----------- | ---------- | | Local dev (uv) | `uv run --package gradio-space python -m gradio_space.app` | | Local Gradio SDK smoke | `pip install -r requirements.txt && python app.py` | | HF Gradio Space | HF runs root `app.py` automatically | | Docker (later) | `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) |