Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
Usage
How to run the Lesson Agent Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the Build Small Hackathon.
The primary UI is the Lesson slides tab (topic β local model outline β downloadable .pptx). Use ResearchMind for corpus Q&A, Language lessons for multilingual text + voice tutoring (OpenBMB + Whisper by default), EchoCoach for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The Chat (debug) tab tests the underlying model.
Prerequisites
- uv installed
- Python 3.12 (see
.python-version) - For Docker testing: Docker installed locally
- For HF Space deploy: Hugging Face account with access to the
build-small-hackathonorg
Local development
1. Install dependencies
uv sync --all-packages
2. Configure environment (optional)
cp .env.example .env
Edit .env if you want a different model preset. Default is minicpm5-1b (transformers).
3. Pre-download the model (optional for GGUF presets)
If using a GGUF preset (qwen3b-gguf), pre-download avoids a long wait on first use:
uv run python scripts/download_model.py
Then add the printed path to .env:
MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf
4. Run the Gradio app
uv run --package gradio-space python -m gradio_space.app
Open http://localhost:7860.
| URL | UI |
|---|---|
/ |
Studio β custom HTML/CSS/JS workspace (Off Brand entry) |
/classic |
Classic β full Gradio tabs, settings, Chat (debug) |
The header in Classic includes a link back to Studio UI.
The model loads on the first Generate (Lesson slides) or chat message. Agent traces are written to outputs/traces/. After code changes, restart the process to pick up updates.
Switching models locally (transformers β llama.cpp)
For local dev you can switch presets at runtime without restarting:
# .env
ALLOW_MODEL_SWITCH=true
ACTIVE_MODEL=minicpm-v-4.6 # startup default (transformers)
| UI | Where to switch |
|---|---|
Classic (/classic) |
Settings accordion β Model preset dropdown (reloads on change) |
| Classic Chat tab | Model preset dropdown (syncs app-wide) |
Studio (/) |
Settings drawer β Model preset; Debug tab has the same list |
| Goal | Preset key |
|---|---|
| MiniCPM-V 4.6 transformers (full VLM) | minicpm-v-4.6 |
| MiniCPM-V 4.6 llama.cpp / Llama Champion | minicpm-v-4.6-gguf |
| MiniCPM5 1B text | minicpm5-1b |
| Lesson LoRA (transformers only) | minicpm5-1b-lesson-lora |
Prefetch the GGUF weights (optional):
uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf
On Hugging Face Space, keep ALLOW_MODEL_SWITCH=false and pin one preset via ACTIVE_MODEL.
Lesson slides β research sources
The Lesson slides tab can ground outlines on external sources before building the deck:
| Source mode | What it does |
|---|---|
| None (model only) | Default β outline from the local model only |
| Web search | Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides |
| RAG (indexed sources) | Use a ResearchMind session and/or URLs/files you provide on this tab |
When Web search is selected, choose a search workflow:
| Workflow | Steps |
|---|---|
| Two-step search (suggest & confirm) | Click Discover sources β select URLs β Generate lesson slides |
| Auto search & ingest | Click Generate lesson slides only β search, ingest, and outline in one step |
RAG mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.
Web discover/auto search requires network access. MemRAG data is stored under RESEARCHMIND_DATA_DIR (default outputs/researchmind).
Web discover/auto search requires network access. MemRAG data is stored under RESEARCHMIND_DATA_DIR (default outputs/researchmind).
EchoCoach β voice practice
The EchoCoach tab records up to 30 seconds, then runs a local pipeline:
Getting audio in
- Record from this computer β click Start recording, speak, then Stop recording (uses PipeWire
pw-recordwhen available). The slider is a max-length safety cap. - Browser Record β needs mic permission and a secure context; open http://localhost:7860 (not
0.0.0.0or a LAN IP). - Upload β drop a
.wavor.mp3file (works everywhere, including HF Space).
If recordings sound silent, check system mic input/mute or set ECHOCOACH_CAPTURE_DEVICE in .env (see arecord -L or pw-cli ls Node).
Pipeline steps:
- ASR β Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
- Analysis β filler highlights, pace score, matplotlib charts
- Coach β rewrite + tips from the text LLM (
ACTIVE_MODEL, defaultminicpm5-1b) - VoiceOut β Piper TTS speaks the summary (or full rewrite if checked)
Install optional extras:
# Whisper.cpp fallback ASR (CPU)
uv sync --package echocoach --extra whisper
# Piper VoiceOut TTS
uv sync --package echocoach --extra piper
python -m piper.download_voices en_US-lessac-medium
Configure presets in voice_models.yaml or via .env:
| Variable | Default | Description |
|---|---|---|
ECHOCOACH_ASR_PRESET |
whisper-cpp-base |
ASR preset key (Cohere-free default); use cohere-transcribe for Cohere demo |
ECHOCOACH_TTS_PRESET |
piper-multilingual |
TTS preset key (EchoCoach, default VoiceOut) |
ECHOCOACH_REALTIME_TTS_PRESET |
vibevoice-realtime-0.5b |
Language lessons streaming TTS (see below) |
ECHOCOACH_COACH_MODEL |
minicpm5-1b-language-lesson-hub |
Text coach preset (OpenBMB + FR/AR LoRA; from models.yaml) |
ECHOCOACH_COACH_FALLBACK |
minicpm5-1b |
Comma-separated fallback presets if primary coach fails to load |
ECHOCOACH_MAX_SECONDS |
30 |
Max recording length |
Cohere Transcribe (cohere-transcribe) is gated on Hugging Face β run huggingface-cli login, accept the model terms, then set ECHOCOACH_ASR_PRESET=cohere-transcribe. GPU recommended for ASR + coach together.
Smoke tests (analysis only, no GPU):
bash scripts/echo_coach_smoke.sh
Language lessons β multilingual coach (Studio tab)
The Language lessons tab is the primary voice learning experience: one page for text, hold-to-talk mic, and audio upload, with optional auto VoiceOut on every reply.
| Input | Output |
|---|---|
| Type a question | Chat bubble in target language |
| Hold mic / upload audio | Transcript + teacher reply; auto-play TTS when enabled |
| Other (text only) language code | Written lesson via coach prompts (no Piper voice for unsupported codes) |
Default stack (Cohere-free): Whisper.cpp ASR β MiniCPM5-1B + language-lesson-lora (French/Arabic) β Piper or VibeVoice Realtime for speech out.
Rebuild training JSONL from Hugging Face sources:
uv run python research/data/build_language_lesson_chat.py
modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish
Optional Cohere Labs partner demo: Cohere Transcribe + Tiny Aya Global.
Default .env / Space secrets:
ECHOCOACH_ASR_PRESET=whisper-cpp-base
ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
ECHOCOACH_COACH_FALLBACK=minicpm5-1b
ECHOCOACH_TTS_PRESET=piper-multilingual
ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
| Mode | Purpose |
|---|---|
| Explain | Tutor any topic in plain language |
| Lesson coach | Discuss and outline lesson content |
Turn-based (not full duplex): speak β wait β hear reply. Auto-speak replies synthesizes TTS each turn when the language has a Piper voice.
Pitch metrics and monologue analysis live in Classic UI β EchoCoach (/classic).
TeacherVoice β Classic UI (turn-based)
The TeacherVoice tab in /classic is the legacy multi-turn voice teacher β same pipeline as Language lessons, plus Pitch practice mode.
| Mode | Purpose |
|---|---|
| Explain | Tutor any topic in plain language |
| Lesson coach | Discuss and outline lesson content verbally |
| Pitch practice | Short live speaking tips each turn |
EchoCoach vs TeacherVoice
| EchoCoach | TeacherVoice | |
|---|---|---|
| Interaction | One-shot after Analyze pitch | Multi-turn Send turn |
| Best for | Pace/filler charts, JSON rewrite report | Q&A, lesson discussion, conversational pitch tips |
| TTS | One VoiceOut clip per analysis | Voice reply every turn (first sentence plays quickly when Piper is installed) |
| RAG | No | Optional ResearchMind grounding (Explain / Lesson) |
Flow per turn: record up to 15s β ASR β text LLM with chat history β Piper TTS (auto-plays when installed).
After each reply, use Speak last reply or Speak first sentence to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).
Install Piper for voice output (included in gradio-space deps after uv sync):
uv sync
python -m piper.download_voices en_US-lessac-medium
Voices are stored under models/piper/ (gitignored) or ~/.local/share/piper/voices/. Restart the Gradio app after installing Piper so the Speak buttons can synthesize audio.
Realtime TTS (VibeVoice) β microsoft/VibeVoice-Realtime-0.5B is registered in voice_models.yaml as vibevoice-realtime-0.5b (~300 ms to first audio, streaming text-in). TeacherVoice uses realtime_tts_preset from YAML by default; override with ECHOCOACH_REALTIME_TTS_PRESET or set ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.
Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).
Reuse VoiceOut in other tabs via gradio_space.voice_helpers.speak_last_assistant_reply.
Optional omni profile (GPU, experimental β falls back to ASR+LLM+Piper):
ECHOCOACH_VOICE_PROFILE=omni
ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5
Unit tests (no GPU):
uv run pytest libs/echocoach/tests/test_teacher_voice.py -q
5. Upload agent trace (Sharing is Caring badge)
uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces
5. Quick sanity checks
# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"
# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"
Local env reference
| Variable | Default | Description |
|---|---|---|
INFERENCE_BACKEND |
llama_cpp |
llama_cpp or transformers |
MODEL_REPO |
Qwen/Qwen2.5-3B-Instruct-GGUF |
Hub repo for GGUF |
MODEL_FILE |
qwen2.5-3b-instruct-q4_k_m.gguf |
GGUF filename |
MODEL_PATH |
β | Local GGUF path (skips Hub download) |
N_CTX |
4096 |
Context window |
N_GPU_LAYERS |
0 |
GPU layers for llama.cpp (0 = CPU only) |
PORT |
7860 |
Gradio listen port |
MODEL_ID |
Qwen/Qwen2.5-3B-Instruct |
Used when INFERENCE_BACKEND=transformers |
Optional: transformers backend
Heavier install; only needed if you switch away from llama.cpp:
uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
uv run --package gradio-space python -m gradio_space.app
Gradio SDK local smoke test (matches HF Space build)
Before pushing to Hugging Face, verify the Gradio SDK entry point:
python -m venv .venv-gradio && source .venv-gradio/bin/activate
pip install -r requirements.txt
ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py
Open http://localhost:7860 β Studio at /, Classic at /classic.
Day-to-day development can still use uv run (see above); this path mirrors what HF installs from requirements.txt.
Hugging Face Space deployment (Gradio SDK + ZeroGPU)
The Space card metadata lives in the YAML frontmatter at the top of README.md (sdk: gradio, app_file: app.py).
1. Push code to GitHub
Make sure main contains at minimum:
app.py,requirements.txt,packages.txtREADME.md(withsdk: gradio,sdk_version,app_file: app.py)models.yaml,skills/apps/gradio-space/and alllibs/*packages
The root Dockerfile stays in the repo for a later Docker SDK deploy (see below).
2. Create the Space
- Go to build-small-hackathon
- New Space
- Name: e.g.
lesson-agentorsmall-model-hackathon - SDK: Gradio (Blank template)
- Hardware: ZeroGPU (creator needs PRO/Team) or GPU basic
- Link your GitHub repo, or push directly to the Space git remote
CLI alternative (if you have hf installed and org access):
hf repo create build-small-hackathon/<your-space-name> \
--repo-type space \
--space_sdk gradio
3. Set Space environment variables
In the Space Settings β Variables and secrets:
| Variable | Value |
|---|---|
ACTIVE_MODEL |
minicpm5-1b |
ALLOW_MODEL_SWITCH |
false |
RESEARCHMIND_DATA_DIR |
/tmp/researchmind |
Default preset in models.yaml is minicpm5-1b (transformers) β suitable for ZeroGPU.
4. Build and verify
HF installs from requirements.txt and runs root app.py. Check the Logs tab for:
- Successful pip install (first build may take several minutes β
llama-cpp-pythoncompiles) Running on local URL: 0.0.0.0:7860
Smoke test on the live Space:
/β Studio UI loads/classicβ all tabs render- Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
- First LLM request may be slow (model download + ZeroGPU queue)
5. ZeroGPU notes
LLM handlers use @spaces.GPU via gradio_space/spaces_runtime.py. If you see No CUDA GPUs are available, an inference path is running outside a decorated handler.
Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.
6. Optional: persistent model cache
Attach a Storage Bucket in Space settings so Hub model weights survive restarts.
Docker SDK deployment (later)
Both deploy paths live on the same branch. HF reads one sdk: from README β switch to Docker when you are ready for a dedicated-GPU Space.
- Change README.md frontmatter to
sdk: docker,app_port: 7860(removesdk_version/app_file) - Create or reconfigure a Space with Docker SDK and GPU basic hardware
- Set the same env vars (
ACTIVE_MODEL=minicpm5-1b, etc.)
Local Docker smoke test
docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
-e ACTIVE_MODEL=minicpm5-1b \
-e ALLOW_MODEL_SWITCH=false \
-e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
hackathon-space
Open http://localhost:7860 β Studio at /, Classic tabs at /classic. Stop with Ctrl+C.
To use a pre-downloaded local GGUF model inside Docker, mount it and set MODEL_PATH:
docker run --rm -p 7860:7860 \
-v "$(pwd)/models:/app/models:ro" \
-e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
hackathon-space
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| First chat hangs / slow | Model downloading from Hub | Wait on Space; use Storage Bucket for cache |
Failed to load model in chat |
Wrong ACTIVE_MODEL preset |
Use minicpm5-1b or valid key from models.yaml |
| Space build fails on pip install | llama-cpp-python compile |
Check Logs; default preset avoids GGUF at runtime |
| Space build fails | Malformed README YAML | Ensure sdk: gradio and app_file: app.py in README frontmatter |
| No CUDA GPUs on ZeroGPU | Handler outside @spaces.GPU |
LLM entry points must use gpu_task in spaces_runtime.py |
Docker build fails on llama-cpp-python |
Missing build tools | Dockerfile installs build-essential and cmake |
| Port already in use locally | Another process on 7860 | PORT=7861 python app.py or uv run ... |
Entrypoint summary
| Environment | How to run |
|---|---|
| Local dev (uv) | uv run --package gradio-space python -m gradio_space.app |
| Local Gradio SDK smoke | pip install -r requirements.txt && python app.py |
| HF Gradio Space | HF runs root app.py automatically |
| Docker (later) | docker run -p 7860:7860 hackathon-space (after README sdk: docker) |