lesson-agent-dev / USAGE.md
msg encrypted ai
Feat/sprint last 2hours (#22)
aac5f23
|
Raw
History Blame Contribute Delete
18.6 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Usage

How to run the Lesson Agent Gradio app locally, deploy to a Hugging Face Space (Gradio SDK + ZeroGPU), and optionally test with Docker later for the Build Small Hackathon.

The primary UI is the Lesson slides tab (topic β†’ local model outline β†’ downloadable .pptx). Use ResearchMind for corpus Q&A, Language lessons for multilingual text + voice tutoring (OpenBMB + Whisper by default), EchoCoach for one-shot pitch analysis in Classic UI, or ground lessons directly from the Lesson tab. The Chat (debug) tab tests the underlying model.

Prerequisites

  • uv installed
  • Python 3.12 (see .python-version)
  • For Docker testing: Docker installed locally
  • For HF Space deploy: Hugging Face account with access to the build-small-hackathon org

Local development

1. Install dependencies

uv sync --all-packages

2. Configure environment (optional)

cp .env.example .env

Edit .env if you want a different model preset. Default is minicpm5-1b (transformers).

3. Pre-download the model (optional for GGUF presets)

If using a GGUF preset (qwen3b-gguf), pre-download avoids a long wait on first use:

uv run python scripts/download_model.py

Then add the printed path to .env:

MODEL_PATH=./models/qwen2.5-3b-instruct-q4_k_m.gguf

4. Run the Gradio app

uv run --package gradio-space python -m gradio_space.app

Open http://localhost:7860.

URL UI
/ Studio β€” custom HTML/CSS/JS workspace (Off Brand entry)
/classic Classic β€” full Gradio tabs, settings, Chat (debug)

The header in Classic includes a link back to Studio UI.

The model loads on the first Generate (Lesson slides) or chat message. Agent traces are written to outputs/traces/. After code changes, restart the process to pick up updates.

Switching models locally (transformers ↔ llama.cpp)

For local dev you can switch presets at runtime without restarting:

# .env
ALLOW_MODEL_SWITCH=true
ACTIVE_MODEL=minicpm-v-4.6          # startup default (transformers)
UI Where to switch
Classic (/classic) Settings accordion β†’ Model preset dropdown (reloads on change)
Classic Chat tab Model preset dropdown (syncs app-wide)
Studio (/) Settings drawer β†’ Model preset; Debug tab has the same list
Goal Preset key
MiniCPM-V 4.6 transformers (full VLM) minicpm-v-4.6
MiniCPM-V 4.6 llama.cpp / Llama Champion minicpm-v-4.6-gguf
MiniCPM5 1B text minicpm5-1b
Lesson LoRA (transformers only) minicpm5-1b-lesson-lora

Prefetch the GGUF weights (optional):

uv run python scripts/download_model.py --preset minicpm-v-4.6-gguf

On Hugging Face Space, keep ALLOW_MODEL_SWITCH=false and pin one preset via ACTIVE_MODEL.

Lesson slides β€” research sources

The Lesson slides tab can ground outlines on external sources before building the deck:

Source mode What it does
None (model only) Default β€” outline from the local model only
Web search Search the web for the lesson topic, ingest pages, retrieve passages, then draft slides
RAG (indexed sources) Use a ResearchMind session and/or URLs/files you provide on this tab

When Web search is selected, choose a search workflow:

Workflow Steps
Two-step search (suggest & confirm) Click Discover sources β†’ select URLs β†’ Generate lesson slides
Auto search & ingest Click Generate lesson slides only β€” search, ingest, and outline in one step

RAG mode accepts an optional ResearchMind session, document checkboxes (scope), pasted URLs, and PDF/DOCX uploads. Indexed content is retrieved and passed to the outline step.

Web discover/auto search requires network access. MemRAG data is stored under RESEARCHMIND_DATA_DIR (default outputs/researchmind).

Web discover/auto search requires network access. MemRAG data is stored under RESEARCHMIND_DATA_DIR (default outputs/researchmind).

EchoCoach β€” voice practice

The EchoCoach tab records up to 30 seconds, then runs a local pipeline:

Getting audio in

  • Record from this computer β€” click Start recording, speak, then Stop recording (uses PipeWire pw-record when available). The slider is a max-length safety cap.
  • Browser Record β€” needs mic permission and a secure context; open http://localhost:7860 (not 0.0.0.0 or a LAN IP).
  • Upload β€” drop a .wav or .mp3 file (works everywhere, including HF Space).

If recordings sound silent, check system mic input/mute or set ECHOCOACH_CAPTURE_DEVICE in .env (see arecord -L or pw-cli ls Node).

Pipeline steps:

  1. ASR β€” Cohere Transcribe 2B (14 languages) or Whisper.cpp tiny/base
  2. Analysis β€” filler highlights, pace score, matplotlib charts
  3. Coach β€” rewrite + tips from the text LLM (ACTIVE_MODEL, default minicpm5-1b)
  4. VoiceOut β€” Piper TTS speaks the summary (or full rewrite if checked)

Install optional extras:

# Whisper.cpp fallback ASR (CPU)
uv sync --package echocoach --extra whisper

# Piper VoiceOut TTS
uv sync --package echocoach --extra piper
python -m piper.download_voices en_US-lessac-medium

Configure presets in voice_models.yaml or via .env:

Variable Default Description
ECHOCOACH_ASR_PRESET whisper-cpp-base ASR preset key (Cohere-free default); use cohere-transcribe for Cohere demo
ECHOCOACH_TTS_PRESET piper-multilingual TTS preset key (EchoCoach, default VoiceOut)
ECHOCOACH_REALTIME_TTS_PRESET vibevoice-realtime-0.5b Language lessons streaming TTS (see below)
ECHOCOACH_COACH_MODEL minicpm5-1b-language-lesson-hub Text coach preset (OpenBMB + FR/AR LoRA; from models.yaml)
ECHOCOACH_COACH_FALLBACK minicpm5-1b Comma-separated fallback presets if primary coach fails to load
ECHOCOACH_MAX_SECONDS 30 Max recording length

Cohere Transcribe (cohere-transcribe) is gated on Hugging Face β€” run huggingface-cli login, accept the model terms, then set ECHOCOACH_ASR_PRESET=cohere-transcribe. GPU recommended for ASR + coach together.

Smoke tests (analysis only, no GPU):

bash scripts/echo_coach_smoke.sh

Language lessons β€” multilingual coach (Studio tab)

The Language lessons tab is the primary voice learning experience: one page for text, hold-to-talk mic, and audio upload, with optional auto VoiceOut on every reply.

Input Output
Type a question Chat bubble in target language
Hold mic / upload audio Transcript + teacher reply; auto-play TTS when enabled
Other (text only) language code Written lesson via coach prompts (no Piper voice for unsupported codes)

Default stack (Cohere-free): Whisper.cpp ASR β†’ MiniCPM5-1B + language-lesson-lora (French/Arabic) β†’ Piper or VibeVoice Realtime for speech out.

Rebuild training JSONL from Hugging Face sources:

uv run python research/data/build_language_lesson_chat.py
modal run research/modal/finetune_app.py --job language-lesson-lora --max-steps 30 --no-publish

Optional Cohere Labs partner demo: Cohere Transcribe + Tiny Aya Global.

Default .env / Space secrets:

ECHOCOACH_ASR_PRESET=whisper-cpp-base
ECHOCOACH_COACH_MODEL=minicpm5-1b-language-lesson-hub
ECHOCOACH_COACH_FALLBACK=minicpm5-1b
ECHOCOACH_TTS_PRESET=piper-multilingual
ECHOCOACH_REALTIME_TTS_PRESET=vibevoice-realtime-0.5b
Mode Purpose
Explain Tutor any topic in plain language
Lesson coach Discuss and outline lesson content

Turn-based (not full duplex): speak β†’ wait β†’ hear reply. Auto-speak replies synthesizes TTS each turn when the language has a Piper voice.

Pitch metrics and monologue analysis live in Classic UI β†’ EchoCoach (/classic).

TeacherVoice β€” Classic UI (turn-based)

The TeacherVoice tab in /classic is the legacy multi-turn voice teacher β€” same pipeline as Language lessons, plus Pitch practice mode.

Mode Purpose
Explain Tutor any topic in plain language
Lesson coach Discuss and outline lesson content verbally
Pitch practice Short live speaking tips each turn

EchoCoach vs TeacherVoice

EchoCoach TeacherVoice
Interaction One-shot after Analyze pitch Multi-turn Send turn
Best for Pace/filler charts, JSON rewrite report Q&A, lesson discussion, conversational pitch tips
TTS One VoiceOut clip per analysis Voice reply every turn (first sentence plays quickly when Piper is installed)
RAG No Optional ResearchMind grounding (Explain / Lesson)

Flow per turn: record up to 15s β†’ ASR β†’ text LLM with chat history β†’ Piper TTS (auto-plays when installed).

After each reply, use Speak last reply or Speak first sentence to generate or replay VoiceOut from the latest assistant message (works even if auto-TTS was skipped).

Install Piper for voice output (included in gradio-space deps after uv sync):

uv sync
python -m piper.download_voices en_US-lessac-medium

Voices are stored under models/piper/ (gitignored) or ~/.local/share/piper/voices/. Restart the Gradio app after installing Piper so the Speak buttons can synthesize audio.

Realtime TTS (VibeVoice) β€” microsoft/VibeVoice-Realtime-0.5B is registered in voice_models.yaml as vibevoice-realtime-0.5b (~300 ms to first audio, streaming text-in). TeacherVoice uses realtime_tts_preset from YAML by default; override with ECHOCOACH_REALTIME_TTS_PRESET or set ECHOCOACH_TTS_PRESET=vibevoice-realtime-0.5b globally. GPU recommended; falls back to Piper until the model loads. English-first; de/fr/it/es/pt/nl/pl/ja/ko are experimental per the model card.

Enable RAG in the accordion: pick a ResearchMind session and optional documents (same scope rules as Chat debug).

Reuse VoiceOut in other tabs via gradio_space.voice_helpers.speak_last_assistant_reply.

Optional omni profile (GPU, experimental β€” falls back to ASR+LLM+Piper):

ECHOCOACH_VOICE_PROFILE=omni
ECHOCOACH_OMNI_MODEL=openbmb/MiniCPM-o-4_5

Unit tests (no GPU):

uv run pytest libs/echocoach/tests/test_teacher_voice.py -q

5. Upload agent trace (Sharing is Caring badge)

uv run python scripts/upload_trace.py --repo-id YOUR_USER/build-small-agent-traces

5. Quick sanity checks

# Inference package resolves
uv run python -c "from inference.factory import get_backend; print(type(get_backend()).__name__)"

# Gradio app module loads
uv run --package gradio-space python -c "from gradio_space.app import build_demo; print(build_demo())"

Local env reference

Variable Default Description
INFERENCE_BACKEND llama_cpp llama_cpp or transformers
MODEL_REPO Qwen/Qwen2.5-3B-Instruct-GGUF Hub repo for GGUF
MODEL_FILE qwen2.5-3b-instruct-q4_k_m.gguf GGUF filename
MODEL_PATH β€” Local GGUF path (skips Hub download)
N_CTX 4096 Context window
N_GPU_LAYERS 0 GPU layers for llama.cpp (0 = CPU only)
PORT 7860 Gradio listen port
MODEL_ID Qwen/Qwen2.5-3B-Instruct Used when INFERENCE_BACKEND=transformers

Optional: transformers backend

Heavier install; only needed if you switch away from llama.cpp:

uv sync --package inference --extra transformers
INFERENCE_BACKEND=transformers MODEL_ID=Qwen/Qwen2.5-3B-Instruct \
  uv run --package gradio-space python -m gradio_space.app

Gradio SDK local smoke test (matches HF Space build)

Before pushing to Hugging Face, verify the Gradio SDK entry point:

python -m venv .venv-gradio && source .venv-gradio/bin/activate
pip install -r requirements.txt
ACTIVE_MODEL=minicpm5-1b ALLOW_MODEL_SWITCH=false python app.py

Open http://localhost:7860 β€” Studio at /, Classic at /classic.

Day-to-day development can still use uv run (see above); this path mirrors what HF installs from requirements.txt.


Hugging Face Space deployment (Gradio SDK + ZeroGPU)

The Space card metadata lives in the YAML frontmatter at the top of README.md (sdk: gradio, app_file: app.py).

1. Push code to GitHub

Make sure main contains at minimum:

  • app.py, requirements.txt, packages.txt
  • README.md (with sdk: gradio, sdk_version, app_file: app.py)
  • models.yaml, skills/
  • apps/gradio-space/ and all libs/* packages

The root Dockerfile stays in the repo for a later Docker SDK deploy (see below).

2. Create the Space

  1. Go to build-small-hackathon
  2. New Space
  3. Name: e.g. lesson-agent or small-model-hackathon
  4. SDK: Gradio (Blank template)
  5. Hardware: ZeroGPU (creator needs PRO/Team) or GPU basic
  6. Link your GitHub repo, or push directly to the Space git remote

CLI alternative (if you have hf installed and org access):

hf repo create build-small-hackathon/<your-space-name> \
  --repo-type space \
  --space_sdk gradio

3. Set Space environment variables

In the Space Settings β†’ Variables and secrets:

Variable Value
ACTIVE_MODEL minicpm5-1b
ALLOW_MODEL_SWITCH false
RESEARCHMIND_DATA_DIR /tmp/researchmind

Default preset in models.yaml is minicpm5-1b (transformers) β€” suitable for ZeroGPU.

4. Build and verify

HF installs from requirements.txt and runs root app.py. Check the Logs tab for:

  • Successful pip install (first build may take several minutes β€” llama-cpp-python compiles)
  • Running on local URL: 0.0.0.0:7860

Smoke test on the live Space:

  1. / β€” Studio UI loads
  2. /classic β€” all tabs render
  3. Generate slides with a simple topic (e.g. "Photosynthesis, grade 8, 5 slides")
  4. First LLM request may be slow (model download + ZeroGPU queue)

5. ZeroGPU notes

LLM handlers use @spaces.GPU via gradio_space/spaces_runtime.py. If you see No CUDA GPUs are available, an inference path is running outside a decorated handler.

Startup model preload is skipped on HF Gradio runtime; the first user request loads the model inside a GPU task.

6. Optional: persistent model cache

Attach a Storage Bucket in Space settings so Hub model weights survive restarts.


Docker SDK deployment (later)

Both deploy paths live on the same branch. HF reads one sdk: from README β€” switch to Docker when you are ready for a dedicated-GPU Space.

  1. Change README.md frontmatter to sdk: docker, app_port: 7860 (remove sdk_version / app_file)
  2. Create or reconfigure a Space with Docker SDK and GPU basic hardware
  3. Set the same env vars (ACTIVE_MODEL=minicpm5-1b, etc.)

Local Docker smoke test

docker build -t hackathon-space .
docker run --rm -p 7860:7860 \
  -e ACTIVE_MODEL=minicpm5-1b \
  -e ALLOW_MODEL_SWITCH=false \
  -e RESEARCHMIND_DATA_DIR=/tmp/researchmind \
  hackathon-space

Open http://localhost:7860 β€” Studio at /, Classic tabs at /classic. Stop with Ctrl+C.

To use a pre-downloaded local GGUF model inside Docker, mount it and set MODEL_PATH:

docker run --rm -p 7860:7860 \
  -v "$(pwd)/models:/app/models:ro" \
  -e MODEL_PATH=/app/models/qwen2.5-3b-instruct-q4_k_m.gguf \
  hackathon-space

Troubleshooting

Symptom Likely cause Fix
First chat hangs / slow Model downloading from Hub Wait on Space; use Storage Bucket for cache
Failed to load model in chat Wrong ACTIVE_MODEL preset Use minicpm5-1b or valid key from models.yaml
Space build fails on pip install llama-cpp-python compile Check Logs; default preset avoids GGUF at runtime
Space build fails Malformed README YAML Ensure sdk: gradio and app_file: app.py in README frontmatter
No CUDA GPUs on ZeroGPU Handler outside @spaces.GPU LLM entry points must use gpu_task in spaces_runtime.py
Docker build fails on llama-cpp-python Missing build tools Dockerfile installs build-essential and cmake
Port already in use locally Another process on 7860 PORT=7861 python app.py or uv run ...

Entrypoint summary

Environment How to run
Local dev (uv) uv run --package gradio-space python -m gradio_space.app
Local Gradio SDK smoke pip install -r requirements.txt && python app.py
HF Gradio Space HF runs root app.py automatically
Docker (later) docker run -p 7860:7860 hackathon-space (after README sdk: docker)