Spaces:
Sleeping
Sleeping
| # Usage | |
| How to run the **Lesson Agent** Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon). | |
| The primary UI is the **Lesson slides** tab (topic β local model outline β downloadable `.pptx`). Use **ResearchMind** for corpus Q&A, **Language lessons** for multilingual text + voice tutoring (OpenBMB + Whisper by default), **EchoCoach** for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The **Chat (debug)** tab tests the underlying model. | |
| ## Prerequisites | |
| - [uv](https://docs.astral.sh/uv/) installed | |
| - Python 3.12 (see `.python-version`) | |
| - For Docker testing: Docker installed locally | |
| - For HF Space deploy: Hugging Face account with access to the `build-small-hackathon` org | |
| ## Local development | |
| ### 1. Install dependencies | |
| ```bash | |
| uv sync --all-packages | |
| ``` | |
| ### 2. Configure environment (optional) | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| Edit `.env` if you want a different model preset. Default is `minicpm5-1b` (transformers). | |
| ### 3. Pre-download the model (optional for GGUF presets) | |
| If using a GGUF preset (`qwen3b-gguf`), pre-download avoids a long wait on first use: | |
| ```bash | |
| uv run python scripts/download_model.py | |
| ``` | |
| Then add the printed path to `.env`: | |
| ```bash | |
| MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf | |
| ``` | |
| ### 4. Run the Gradio app | |
| ```bash | |
| uv run --package gradio-space python -m gradio_space.app | |
| ``` | |
| Open [http://localhost:7860](http://localhost:7860). | |
| | URL | UI | | |
| |-----|-----| | |
| | `/` | **Studio** β custom HTML/CSS/JS workspace (Off Brand entry) | | |
| | `/classic` | **Classic** β full Gradio tabs, settings, Chat (debug) | | |
| The header in Classic includes a link back to Studio UI. | |
| The model loads on the **first Generate** (Lesson slides) or chat message. Agent traces are written to `outputs/traces/`. After code changes, restart the process to pick up updates. | |
| ### Switching models locally (transformers β llama.cpp) | |
| For local dev you can switch presets at runtime without restarting: | |
| ```bash | |
| # .env | |
| ALLOW_MODEL_SWITCH=true | |
| ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers) | |
| ``` | |
| | UI | Where to switch | | |
| |----|-----------------| | |
| | **Classic** (`/classic`) | **Settings** accordion β Model preset dropdown (reloads on change) | | |
| | **Classic** Chat tab | Model preset dropdown (syncs app-wide) | | |
| | **Studio** (`/`) | Settings drawer β Model preset; Debug tab has the same list | | |
| | Goal | Preset key | | |
| |------|------------| | |
| | MiniCPM-V 4.6 transformers (full VLM) | `minicpm-v-4.6` | | |
| | MiniCPM-V 4.6 llama.cpp / Llama Champion | `minicpm-v-4.6-gguf` | | |
| | MiniCPM5 1B text | `minicpm5-1b` | | |
| | Lesson LoRA (transformers only) | `minicpm5-1b-lesson-lora` | | |
| Prefetch the GGUF weights (optional): | |
| ```bash | |
| uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf | |
| ``` | |
| On Hugging Face Space, keep `ALLOW_MODEL_SWITCH=false` and pin one preset via `ACTIVE_MODEL`. | |
| ### Lesson slides β research sources | |
| The **Lesson slides** tab can ground outlines on external sources before building the deck: | |
| | Source mode | What it does | | |
| | ----------- | ------------ | | |
| | **None (model only)** | Default β outline from the local model only | | |
| | **Web search** | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides | | |
| | **RAG (indexed sources)** | Use a **ResearchMind session** and/or URLs/files you provide on this tab | | |
| When **Web search** is selected, choose a **search workflow**: | |
| | Workflow | Steps | | |
| | -------- | ----- | | |
| | **Two-step search (suggest & confirm)** | Click **Discover sources** β select URLs β **Generate lesson slides** | | |
| | **Auto search & ingest** | Click **Generate lesson slides** only β search, ingest, and outline in one step | | |
| **RAG** mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step. | |
| Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`). | |
| Web discover/auto search requires network access. MemRAG data is stored under `RESEARCHMIND_DATA_DIR` (default `outputs/researchmind`). | |
| ### EchoCoach β voice practice | |
| The **EchoCoach** tab records up to 30 seconds, then runs a local pipeline: | |
| **Getting audio in** | |
| - **Record from this computer** β click **Start recording**, speak, then **Stop recording** (uses PipeWire `pw-record` when available). The slider is a max-length safety cap. | |
| - **Browser Record** β needs mic permission and a secure context; open **http://localhost:7860** (not `0.0.0.0` or a LAN IP). | |
| - **Upload** β drop a `.wav` or `.mp3` file (works everywhere, including HF Space). | |
| If recordings sound silent, check system mic input/mute or set `ECHOCOACH_CAPTURE_DEVICE` in `.env` (see `arecord -L` or `pw-cli ls Node`). | |
| Pipeline steps: | |
| 1. **ASR** β Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base | |
| 2. **Analysis** β filler highlights, pace score, matplotlib charts | |
| 3. **Coach** β rewrite + tips from the text LLM (`ACTIVE_MODEL`, default `minicpm5-1b`) | |
| 4. **VoiceOut** β Piper TTS speaks the summary (or full rewrite if checked) | |
| Install optional extras: | |
| ```bash | |
| # Whisper.cpp fallback ASR (CPU) | |
| uv sync --package echocoach --extra whisper | |
| # Piper VoiceOut TTS | |
| uv sync --package echocoach --extra piper | |
| python -m piper.download_voices en_US-lessac-medium | |
| ``` | |
| Configure presets in [`voice_models.yaml`](voice_models.yaml) or via `.env`: | |
| | Variable | Default | Description | | |
| | -------- | ------- | ----------- | | |
| | `ECHOCOACH_ASR_PRESET` | `whisper-cpp-base` | ASR preset key (Cohere-free default); use `cohere-transcribe` for Cohere demo | | |
| | `ECHOCOACH_TTS_PRESET` | `piper-multilingual` | TTS preset key (EchoCoach, default VoiceOut) | | |
| | `ECHOCOACH_REALTIME_TTS_PRESET` | `vibevoice-realtime-0.5b` | Language lessons streaming TTS (see below) | | |
| | `ECHOCOACH_COACH_MODEL` | `minicpm5-1b-language-lesson-hub` | Text coach preset (OpenBMB + FR/AR LoRA; from `models.yaml`) | | |
| | `ECHOCOACH_COACH_FALLBACK` | `minicpm5-1b` | Comma-separated fallback presets if primary coach fails to load | | |
| | `ECHOCOACH_MAX_SECONDS` | `30` | Max recording length | | |
| **Cohere Transcribe** (`cohere-transcribe`) is gated on Hugging Face β run `huggingface-cli login`, accept the model terms, then set `ECHOCOACH_ASR_PRESET=cohere-transcribe`. GPU recommended for ASR + coach together. | |
| Smoke tests (analysis only, no GPU): | |
| ```bash | |
| bash scripts/echo_coach_smoke.sh | |
| ``` | |
| ### Language lessons β multilingual coach (Studio tab) | |
| The **Language lessons** tab is the primary voice learning experience: one page for **text**, **hold-to-talk mic**, and **audio upload**, with optional auto VoiceOut on every reply. | |
| | Input | Output | | |
| | ----- | ------ | | |
| | Type a question | Chat bubble in target language | | |
| | Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled | | |
| | **Other (text only)** language code | Written lesson via coach prompts (no Piper voice for unsupported codes) | | |
| **Default stack (Cohere-free):** [Whisper.cpp](https://github.com/ggerganov/whisper.cpp) ASR β [MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B) + `language-lesson-lora` (French/Arabic) β Piper or VibeVoice Realtime for speech out. | |
| Rebuild training JSONL from Hugging Face sources: | |
| ```bash | |
| uv run python research/data/build_language_lesson_chat.py | |
| modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish | |
| ``` | |
| Optional **Cohere Labs partner demo:** [Cohere Transcribe](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026) + [Tiny Aya Global](https://huggingface.co/CohereLabs/tiny-aya-global). | |
| Default `.env` / Space secrets: | |
| ```bash | |
| ECHOCOACH_ASR_PRESET=whisper-cpp-base | |
| ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub | |
| ECHOCOACH_COACH_FALLBACK=minicpm5-1b | |
| ECHOCOACH_TTS_PRESET=piper-multilingual | |
| ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b | |
| ``` | |
| | Mode | Purpose | | |
| | ---- | ------- | | |
| | **Explain** | Tutor any topic in plain language | | |
| | **Lesson coach** | Discuss and outline lesson content | | |
| Turn-based (not full duplex): speak β wait β hear reply. **Auto-speak replies** synthesizes TTS each turn when the language has a Piper voice. | |
| Pitch metrics and monologue analysis live in **Classic UI β EchoCoach** (`/classic`). | |
| ### TeacherVoice β Classic UI (turn-based) | |
| The **TeacherVoice** tab in `/classic` is the legacy multi-turn voice teacher β same pipeline as Language lessons, plus **Pitch practice** mode. | |
| | Mode | Purpose | | |
| | ---- | ------- | | |
| | **Explain** | Tutor any topic in plain language | | |
| | **Lesson coach** | Discuss and outline lesson content verbally | | |
| | **Pitch practice** | Short live speaking tips each turn | | |
| **EchoCoach vs TeacherVoice** | |
| | | EchoCoach | TeacherVoice | | |
| | --- | --- | --- | | |
| | Interaction | One-shot after **Analyze pitch** | Multi-turn **Send turn** | | |
| | Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips | | |
| | TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) | | |
| | RAG | No | Optional ResearchMind grounding (Explain / Lesson) | | |
| **Flow per turn:** record up to **15s** β ASR β text LLM with chat history β Piper TTS (auto-plays when installed). | |
| After each reply, use **Speak last reply** or **Speak first sentence** to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped). | |
| Install Piper for voice output (included in `gradio-space` deps after `uv sync`): | |
| ```bash | |
| uv sync | |
| python -m piper.download_voices en_US-lessac-medium | |
| ``` | |
| Voices are stored under `models/piper/` (gitignored) or `~/.local/share/piper/voices/`. **Restart the Gradio app** after installing Piper so the Speak buttons can synthesize audio. | |
| **Realtime TTS (VibeVoice)** β [microsoft/VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) is registered in `voice_models.yaml` as `vibevoice-realtime-0.5b` (~300 ms to first audio, streaming text-in). TeacherVoice uses `realtime_tts_preset` from YAML by default; override with `ECHOCOACH_REALTIME_TTS_PRESET` or set `ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b` globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card. | |
| Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug). | |
| Reuse VoiceOut in other tabs via `gradio_space.voice_helpers.speak_last_assistant_reply`. | |
| Optional omni profile (GPU, experimental β falls back to ASR+LLM+Piper): | |
| ```bash | |
| ECHOCOACH_VOICE_PROFILE=omni | |
| ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5 | |
| ``` | |
| Unit tests (no GPU): | |
| ```bash | |
| uv run pytest libs/echocoach/tests/test_teacher_voice.py -q | |
| ``` | |
| ### 5. Upload agent trace (Sharing is Caring badge) | |
| ```bash | |
| uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces | |
| ``` | |
| ### 5. Quick sanity checks | |
| ```bash | |
| # Inference package resolves | |
| uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)" | |
| # Gradio app module loads | |
| uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())" | |
| ``` | |
| ### Local env reference | |
| | Variable | Default | Description | | |
| | ------------------- | --------------------------------- | ------------------------------------------ | | |
| | `INFERENCE_BACKEND` | `llama_cpp` | `llama_cpp` or `transformers` | | |
| | `MODEL_REPO` | `Qwen/Qwen2.5-3B-Instruct-GGUF` | Hub repo for GGUF | | |
| | `MODEL_FILE` | `qwen2.5-3b-instruct-q4_k_m.gguf` | GGUF filename | | |
| | `MODEL_PATH` | β | Local GGUF path (skips Hub download) | | |
| | `N_CTX` | `4096` | Context window | | |
| | `N_GPU_LAYERS` | `0` | GPU layers for llama.cpp (`0` = CPU only) | | |
| | `PORT` | `7860` | Gradio listen port | | |
| | `MODEL_ID` | `Qwen/Qwen2.5-3B-Instruct` | Used when `INFERENCE_BACKEND=transformers` | | |
| ### Optional: transformers backend | |
| Heavier install; only needed if you switch away from llama.cpp: | |
| ```bash | |
| uv sync --package inference --extra transformers | |
| INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \ | |
| uv run --package gradio-space python -m gradio_space.app | |
| ``` | |
| --- | |
| ## Gradio SDK local smoke test (matches HF Space build) | |
| Before pushing to Hugging Face, verify the Gradio SDK entry point: | |
| ```bash | |
| python -m venv .venv-gradio && source .venv-gradio/bin/activate | |
| pip install -r requirements.txt | |
| ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py | |
| ``` | |
| Open [http://localhost:7860](http://localhost:7860) β Studio at `/`, Classic at `/classic`. | |
| Day-to-day development can still use `uv run` (see above); this path mirrors what HF installs from `requirements.txt`. | |
| --- | |
| ## Hugging Face Space deployment (Gradio SDK + ZeroGPU) | |
| The Space card metadata lives in the YAML frontmatter at the top of [README.md](README.md) (`sdk: gradio`, `app_file: app.py`). | |
| ### 1. Push code to GitHub | |
| Make sure `main` contains at minimum: | |
| - `app.py`, `requirements.txt`, `packages.txt` | |
| - `README.md` (with `sdk: gradio`, `sdk_version`, `app_file: app.py`) | |
| - `models.yaml`, `skills/` | |
| - `apps/gradio-space/` and all `libs/*` packages | |
| The root `Dockerfile` stays in the repo for a later Docker SDK deploy (see below). | |
| ### 2. Create the Space | |
| 1. Go to [build-small-hackathon](https://huggingface.co/build-small-hackathon) | |
| 2. **New Space** | |
| 3. Name: e.g. `lesson-agent` or `small-model-hackathon` | |
| 4. SDK: **Gradio** (Blank template) | |
| 5. Hardware: **ZeroGPU** (creator needs PRO/Team) or **GPU basic** | |
| 6. Link your GitHub repo, or push directly to the Space git remote | |
| CLI alternative (if you have `hf` installed and org access): | |
| ```bash | |
| hf repo create build-small-hackathon/<your-space-name> \ | |
| --repo-type space \ | |
| --space_sdk gradio | |
| ``` | |
| ### 3. Set Space environment variables | |
| In the Space **Settings β Variables and secrets**: | |
| | Variable | Value | | |
| | -------- | ----- | | |
| | `ACTIVE_MODEL` | `minicpm5-1b` | | |
| | `ALLOW_MODEL_SWITCH` | `false` | | |
| | `RESEARCHMIND_DATA_DIR` | `/tmp/researchmind` | | |
| Default preset in [`models.yaml`](models.yaml) is `minicpm5-1b` (transformers) β suitable for ZeroGPU. | |
| ### 4. Build and verify | |
| HF installs from `requirements.txt` and runs root `app.py`. Check the **Logs** tab for: | |
| - Successful pip install (first build may take several minutes β `llama-cpp-python` compiles) | |
| - `Running on local URL: 0.0.0.0:7860` | |
| Smoke test on the live Space: | |
| 1. **`/`** β Studio UI loads | |
| 2. **`/classic`** β all tabs render | |
| 3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides") | |
| 4. First LLM request may be slow (model download + ZeroGPU queue) | |
| ### 5. ZeroGPU notes | |
| LLM handlers use `@spaces.GPU` via [`gradio_space/spaces_runtime.py`](apps/gradio-space/src/gradio_space/spaces_runtime.py). If you see **No CUDA GPUs are available**, an inference path is running outside a decorated handler. | |
| Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task. | |
| ### 6. Optional: persistent model cache | |
| Attach a **Storage Bucket** in Space settings so Hub model weights survive restarts. | |
| --- | |
| ## Docker SDK deployment (later) | |
| Both deploy paths live on the same branch. HF reads **one** `sdk:` from README β switch to Docker when you are ready for a dedicated-GPU Space. | |
| 1. Change [README.md](README.md) frontmatter to `sdk: docker`, `app_port: 7860` (remove `sdk_version` / `app_file`) | |
| 2. Create or reconfigure a Space with **Docker** SDK and **GPU basic** hardware | |
| 3. Set the same env vars (`ACTIVE_MODEL=minicpm5-1b`, etc.) | |
| ### Local Docker smoke test | |
| ```bash | |
| docker build -t hackathon-space . | |
| docker run --rm -p 7860:7860 \ | |
| -e ACTIVE_MODEL=minicpm5-1b \ | |
| -e ALLOW_MODEL_SWITCH=false \ | |
| -e RESEARCHMIND_DATA_DIR=/tmp/researchmind \ | |
| hackathon-space | |
| ``` | |
| Open [http://localhost:7860](http://localhost:7860) β Studio at `/`, Classic tabs at `/classic`. Stop with `Ctrl+C`. | |
| To use a pre-downloaded local GGUF model inside Docker, mount it and set `MODEL_PATH`: | |
| ```bash | |
| docker run --rm -p 7860:7860 \ | |
| -v "$(pwd)/models:/app/models:ro" \ | |
| -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \ | |
| hackathon-space | |
| ``` | |
| --- | |
| ## Troubleshooting | |
| | Symptom | Likely cause | Fix | | |
| | ---------------------------------------- | --------------------------------- | -------------------------------------------------------------------- | | |
| | First chat hangs / slow | Model downloading from Hub | Wait on Space; use Storage Bucket for cache | | |
| | `Failed to load model` in chat | Wrong `ACTIVE_MODEL` preset | Use `minicpm5-1b` or valid key from `models.yaml` | | |
| | Space build fails on pip install | `llama-cpp-python` compile | Check Logs; default preset avoids GGUF at runtime | | |
| | Space build fails | Malformed README YAML | Ensure `sdk: gradio` and `app_file: app.py` in README frontmatter | | |
| | No CUDA GPUs on ZeroGPU | Handler outside `@spaces.GPU` | LLM entry points must use `gpu_task` in `spaces_runtime.py` | | |
| | Docker build fails on `llama-cpp-python` | Missing build tools | Dockerfile installs `build-essential` and `cmake` | | |
| | Port already in use locally | Another process on 7860 | `PORT=7861 python app.py` or `uv run ...` | | |
| --- | |
| ## Entrypoint summary | |
| | Environment | How to run | | |
| | ----------- | ---------- | | |
| | Local dev (uv) | `uv run --package gradio-space python -m gradio_space.app` | | |
| | Local Gradio SDK smoke | `pip install -r requirements.txt && python app.py` | | |
| | HF Gradio Space | HF runs root `app.py` automatically | | |
| | Docker (later) | `docker run -p 7860:7860 hackathon-space` (after README `sdk: docker`) | | |