ground-zero / README.md
jefffffff9
Stage 4: split translate/reply UI + CPU-safe TTS + reply-not-translate prompt
9e99c2c
---
title: Sahel-Voice-Lab Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: "5.25.0"
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
- bambara
- fula
- speech-recognition
- text-to-speech
- agriculture
- iot
- language-learning
- west-africa
- low-resource-nlp
- memory
---
# 🌍 Sahel-Voice-Lab
**A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).**
Two intertwined jobs:
1. **Memory loop** — users *teach* the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
2. **Agricultural IoT voice interface** — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.
The core stack is explicitly **100% non-Meta** (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.
---
## What this Space currently runs — the `ground-zero` minimal baseline
The deployed Space (`app_file: app_minimal.py`) is the **Month 1–3 rebuild**
baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field
testing and to build a real-user eval set. No LoRA adapters, no memory loop,
no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in
`app.py` still exists for the full production stack; it is just not what the
Space serves today.
Three stacked changes land dialect fidelity without any training:
1. **Stage 1 — dialect-pinned system prompt** (`src/llm/minimal_client.py`).
Replaces the `GemmaClient` JSON/teacher flow with a plain-text client whose
system prompt pins the target dialect explicitly — *Bambara as spoken in
Bamako, Mali* and *Pular of Fuuta Jallon, as spoken in Guinea* — names the
languages the model must **not** drift into (Wolof, Hausa, Pulaar of
Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair
bilingual gold list as few-shot anchoring
(`configs/dialect_anchors/{bambara_mali,pular_guinea}.json`).
2. **Stage 2 — curated phrasebook short-circuit** (`src/llm/phrasebook.py`).
Before calling the LLM, the user's input is normalised and fuzzy-matched
(threshold 0.88) against a curated English-keyed phrasebook
(`configs/dialect_anchors/{bambara,pular}_phrasebook.json` — 100 Bambara /
110 Pular entries across greetings, family, food, farming, health,
shopping, travel, clarity, time, parting). A hit returns the gold
translation directly — zero LLM risk, zero latency.
3. **Stage 3 — better multilingual base LLM.**
Default `LLM_MODEL_ID` is now **`CohereLabs/aya-expanse-32b`**, a 23-language
multilingual model with much stronger West African coverage than Qwen
2.5-7B. Can be overridden via the `LLM_MODEL_ID` env var (e.g. to
`Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
available on your HF account.
4. **Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot.**
Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
+ audio) is automatic on submit (no LLM), and a separate **Generate reply**
button calls the dialect-anchored LLM for a conversational response. On a
phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
pairs as additional style anchoring. Every turn is appended to
`data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency
breakdown, phrasebook hit, and reply — the substrate for hit-rate
measurement, A/B comparisons, and eventual Stage-5 LoRA training-data
curation. The system prompt now also explicitly tells the LLM to **reply,
not translate** — the few-shot pairs are framed as style/orthography
references only, fixing the "the LLM just echoes the phrasebook target"
regression.
See `docs/baseline_rebuild.md` for the broader minimal-track plan.
---
## Status
| Phase | Feature | State |
|------:|---------|-------|
| 1 | Memory loop (JSONL + HF Hub) | ✅ shipped |
| 2 | Waxal VITS TTS — Bambara | ✅ shipped |
| 2 | Waxal VITS TTS — Fula | ⏳ placeholder until `ous-sow/fula-tts` is trained |
| 3 | Voice-to-voice S2S (F5-TTS + CER) | 🚧 merged, stabilizing |
| — | Adlam ↔ Latin round-trip, per-language prompts | ✅ landed |
See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` for the parallel minimal-track strategy.
---
## Stack
| Layer | Tool |
|-------|------|
| STT | `openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) |
| LLM | `CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr |
| Dialect anchoring (minimal) | `src/llm/minimal_client.py` — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails |
| Phrasebook short-circuit (minimal) | `src/llm/phrasebook.py` — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call |
| TTS (baseline) | `facebook/mms-tts-bam`, `facebook/mms-tts-ful` |
| TTS (Bambara) | `ynnov/ekodi-bambara-tts-female` (Waxal VITS) |
| TTS (Fula) | placeholder → `ous-sow/fula-tts` when published |
| Voice cloning | F5-TTS + OpenVoice V2 (Phase 3, GPU-only) |
| Speaker ID | SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75 |
| Fast path | RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells |
| Persistence | JSONL on disk + HF Hub datasets (no ORM) |
| Training | PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106 |
---
## Three entry points (do not conflate)
| File | Purpose | Lifecycle |
|------|---------|-----------|
| `app_minimal.py` | **Minimal baseline Gradio UI** — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | `python app_minimal.py` |
| `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` |
| `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` |
| `src/api/app.py` | **FastAPI service** — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` |
---
## Repository layout
```
app.py # Gradio (production, HF Spaces)
app_lab.py # Gradio (experimental)
requirements.txt # Spaces runtime — do NOT pin torch/torchaudio
packages.txt # apt deps (ffmpeg)
configs/
base_config.yaml # shared settings
api_config.yaml # FastAPI-specific
lora_bambara.yaml # Bambara LoRA hyperparams
lora_fula.yaml # Fula LoRA hyperparams
data/
phrases/ # RapidFuzz shortcut phrase JSONs per language
vocabulary.jsonl # local mirror of the HF Hub memory dataset
docs/
roadmap_2026-04.md # full architectural walkthrough + action plan
baseline_rebuild.md # parallel minimal-track plan (non-destructive)
notebook_collaboration.md # Kaggle push/pull workflow for contributors
kaggle_mcp_setup.md # optional Kaggle MCP for Claude Desktop
notebooks/
kaggle_master_trainer/ # -> oussow/kaggle-master-trainer (LoRA fine-tune)
train_fula_tts/ # -> oussow/sahel-voice-fula-tts-trainer (TBD)
bootstrap_repos.ipynb
train_colab.ipynb # legacy Colab trainer
scripts/
train_bambara.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
train_fula.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
export_onnx.py # merge LoRA -> ONNX -> TFLite
verify_baseline.py # eval harness
run_server.py # FastAPI launcher
run_data_pipeline.py # dataset prep
push_to_hf.sh # deploy helpers
push_to_kaggle.sh # deploy helpers
runpod_setup.sh
src/
api/ # FastAPI app, schemas, routes, middleware
conversation/ # memory_manager, gemma_client, phrase_matcher, intent_parser
data/ # dataset loading + normalization (Adlam, Bambara)
engine/ # adapter_manager, transcriber, stt_processor, curiosity
iot/ # intent_parser, voice_responder, sensor_bridge
llm/ # LLM client wrappers
memory/ # vocabulary persistence
optimization/ # ONNX / quantization helpers
training/ # trainer, callbacks, augmenters
tts/ # mms_tts, waxal_tts, f5_tts, voice_cloner
voice/ # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/ # pytest — api, data pipeline, engine, iot
```
---
## How the memory loop works
1. Press **Push-to-Talk** → speak in Bambara, Fula, French, or English.
2. **Whisper** transcribes. If the language has a LoRA adapter loaded, `AdapterManager` hot-swaps to it (~50 ms).
3. **Qwen** reads the vocabulary it has learned so far (`MemoryManager.get_vocabulary_context()`), then returns a structured JSON reply with `intent ∈ {teaching, question, conversation, error}`.
4. If `teaching`: the word pair is appended to `data/vocabulary.jsonl` and async-pushed to `ous-sow/sahel-agri-feedback → vocabulary.jsonl`.
5. If `question`: Qwen answers using the remembered vocabulary as source of truth.
6. If `conversation`: Qwen replies naturally.
7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).
The last 5 learned words are always visible in the UI.
---
## How the agricultural voice interface works
1. User asks, e.g., *"A bɛ di wa?"* ("Is it OK?") referring to their field.
2. `intent_parser.py` (keyword-based) classifies the request: `check_soil` / `check_weather` / `irrigation_status` / `pest_alert` / etc.
3. `SensorBridge` calls the configured `SENSOR_API_URL` and returns a typed `SensorData`.
4. `voice_responder.py` maps `(Intent, SensorData)` → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (`SOIL_MOISTURE_LOW=30`, `TEMP_HIGH=38`, pH bounds).
5. TTS speaks the reply.
---
## Environment variables
All variables have sensible defaults, so you can boot the Space without any of them — but without `HF_TOKEN` the memory loop cannot push.
### Core
| Key | Default | Purpose |
|-----|---------|---------|
| `HF_TOKEN` | — | HF write token. Required for Hub push and gated models. |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` | Memory-loop target dataset. |
| `ADAPTER_REPO_ID` | `ous-sow/sahel-agri-adapters` | Published LoRA adapters. |
| `WHISPER_MODEL_ID` | `openai/whisper-large-v3-turbo` | STT base model. |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` | LLM via HF Serverless. Override to any HF Serverless-supported model. |
| `LOG_LEVEL` | `INFO` | Standard Python logging level. |
| `DEVICE` | `cuda` (FastAPI) | Torch device for inference. |
### Adapters & TTS
| Key | Default |
|-----|---------|
| `BAMBARA_ADAPTER_PATH` | `./adapters/bambara` |
| `FULA_ADAPTER_PATH` | `./adapters/fula` |
| `BAMBARA_TTS_REPO` | `ynnov/ekodi-bambara-tts-female` |
| `FULA_TTS_REPO` | `ous-sow/fula-tts` |
### IoT
| Key | Default |
|-----|---------|
| `SENSOR_API_URL` | *(unset → mock sensor)* |
### Self-Teaching tab (triggers Kaggle training runs)
| Key | Default |
|-----|---------|
| `KAGGLE_USERNAME` | — |
| `KAGGLE_KEY` | — |
| `KAGGLE_KERNEL_SLUG` | `ous-sow/sahel-voice-master-trainer` *(override in prod to `oussow/kaggle-master-trainer` — the actual Kaggle owner slug)* |
| `AUTO_TRAIN_THRESHOLD` | `50` |
---
## Run locally
```bash
# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py
# Full production UI (not currently on the Space)
python app.py
# FastAPI service
python scripts/run_server.py
# Experimental lab UI
python app_lab.py
```
System-level dependency: **ffmpeg** (see `packages.txt`).
---
## Training
LoRA fine-tuning runs on **Kaggle T4** or **RunPod** — not locally. Pick one entrypoint:
| Target | Script | Notebook |
|--------|--------|----------|
| Bambara LoRA | `scripts/train_bambara.py` | `notebooks/kaggle_master_trainer/` |
| Fula LoRA | `scripts/train_fula.py` | `notebooks/kaggle_master_trainer/` |
| Fula TTS | — | `notebooks/train_fula_tts/` *(planned)* |
**Contributor workflow:** edit notebooks locally in `notebooks/<slug>/`, commit with `nbstripout` keeping diffs clean, then `cd notebooks/<slug> && kaggle kernels push` to run on Kaggle GPU. Full walkthrough in `docs/notebook_collaboration.md`.
`docs/kaggle_mcp_setup.md` documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.
---
## Export for edge
```bash
python scripts/export_onnx.py # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android
```
ONNX does not support LoRA hot-swap, so export one file per language. `bitsandbytes` NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in `requirements.txt`).
---
## Tests
```bash
pytest tests/
```
Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).
---
## Space secrets (HF UI → Settings → Secrets)
At minimum:
| Key | Value |
|-----|-------|
| `HF_TOKEN` | write-scope token |
| `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` |
| `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model) |
---
## Design constraints (deliberate — do not change without discussion)
- **Adapter hot-swap** via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, `set_adapter` ≈ 50 ms.
- **Qwen "adult-child" JSON contract** — structured `intent`/`reply`/`english`/`teaching_pair` output, parsed out of optional markdown fences.
- **JSONL + Hub push memory** — no ORM, thread-safe `MemoryManager`, async push so UI never blocks.
- **≤ 6 words per sentence** in `voice_responder.py` for clean MMS-TTS.
- **Adlam ↔ Latin dual-script** handling in `adlam.py` + `bam_normalize.py`.
- **Single-file `app.py`** — intentional for now; do not split without a plan.
---
## License
MIT.