--- title: Sahel-Voice-Lab β€” Minimal emoji: 🌍 colorFrom: blue colorTo: green sdk: gradio sdk_version: "5.25.0" app_file: app_minimal.py hardware: cpu-basic pinned: false license: mit tags: - bambara - fula - speech-recognition - text-to-speech - agriculture - iot - language-learning - west-africa - low-resource-nlp - memory --- # 🌍 Sahel-Voice-Lab **A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).** Two intertwined jobs: 1. **Memory loop** β€” users *teach* the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers. 2. **Agricultural IoT voice interface** β€” Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≀ 6 words per sentence for clean TTS. The core stack is explicitly **100% non-Meta** (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback. --- ## What this Space currently runs β€” the `ground-zero` minimal baseline The deployed Space (`app_file: app_minimal.py`) is the **Month 1–3 rebuild** baseline β€” a stripped-down Whisper β†’ LLM β†’ MMS-TTS pipeline used for field testing and to build a real-user eval set. No LoRA adapters, no memory loop, no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in `app.py` still exists for the full production stack; it is just not what the Space serves today. Three stacked changes land dialect fidelity without any training: 1. **Stage 1 β€” dialect-pinned system prompt** (`src/llm/minimal_client.py`). Replaces the `GemmaClient` JSON/teacher flow with a plain-text client whose system prompt pins the target dialect explicitly β€” *Bambara as spoken in Bamako, Mali* and *Pular of Fuuta Jallon, as spoken in Guinea* β€” names the languages the model must **not** drift into (Wolof, Hausa, Pulaar of Senegal, Fulfulde of Nigeria, Jula of CΓ΄te d'Ivoire), and injects a 30-pair bilingual gold list as few-shot anchoring (`configs/dialect_anchors/{bambara_mali,pular_guinea}.json`). 2. **Stage 2 β€” curated phrasebook short-circuit** (`src/llm/phrasebook.py`). Before calling the LLM, the user's input is normalised and fuzzy-matched (threshold 0.88) against a curated English-keyed phrasebook (`configs/dialect_anchors/{bambara,pular}_phrasebook.json` β€” 100 Bambara / 110 Pular entries across greetings, family, food, farming, health, shopping, travel, clarity, time, parting). A hit returns the gold translation directly β€” zero LLM risk, zero latency. 3. **Stage 3 β€” better multilingual base LLM.** Default `LLM_MODEL_ID` is now **`CohereLabs/aya-expanse-32b`**, a 23-language multilingual model with much stronger West African coverage than Qwen 2.5-7B. Can be overridden via the `LLM_MODEL_ID` env var (e.g. to `Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not available on your HF account. 4. **Stage 4 β€” split translate / reply UI + per-turn telemetry + RAG few-shot.** Both Voice and Text tabs use a 4-box layout: phrasebook translation (text + audio) is automatic on submit (no LLM), and a separate **Generate reply** button calls the dialect-anchored LLM for a conversational response. On a phrasebook miss the LLM is RAG-injected with the top-3 nearest curated pairs as additional style anchoring. Every turn is appended to `data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency breakdown, phrasebook hit, and reply β€” the substrate for hit-rate measurement, A/B comparisons, and eventual Stage-5 LoRA training-data curation. The system prompt now also explicitly tells the LLM to **reply, not translate** β€” the few-shot pairs are framed as style/orthography references only, fixing the "the LLM just echoes the phrasebook target" regression. See `docs/baseline_rebuild.md` for the broader minimal-track plan. --- ## Status | Phase | Feature | State | |------:|---------|-------| | 1 | Memory loop (JSONL + HF Hub) | βœ… shipped | | 2 | Waxal VITS TTS β€” Bambara | βœ… shipped | | 2 | Waxal VITS TTS β€” Fula | ⏳ placeholder until `ous-sow/fula-tts` is trained | | 3 | Voice-to-voice S2S (F5-TTS + CER) | 🚧 merged, stabilizing | | β€” | Adlam ↔ Latin round-trip, per-language prompts | βœ… landed | See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` for the parallel minimal-track strategy. --- ## Stack | Layer | Tool | |-------|------| | STT | `openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) | | LLM | `CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient β€” overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr | | Dialect anchoring (minimal) | `src/llm/minimal_client.py` β€” pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails | | Phrasebook short-circuit (minimal) | `src/llm/phrasebook.py` β€” 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call | | TTS (baseline) | `facebook/mms-tts-bam`, `facebook/mms-tts-ful` | | TTS (Bambara) | `ynnov/ekodi-bambara-tts-female` (Waxal VITS) | | TTS (Fula) | placeholder β†’ `ous-sow/fula-tts` when published | | Voice cloning | F5-TTS + OpenVoice V2 (Phase 3, GPU-only) | | Speaker ID | SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine β‰₯ 0.75 | | Fast path | RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells | | Persistence | JSONL on disk + HF Hub datasets (no ORM) | | Training | PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106 | --- ## Three entry points (do not conflate) | File | Purpose | Lifecycle | |------|---------|-----------| | `app_minimal.py` | **Minimal baseline Gradio UI** β€” what the HF Space currently serves. Whisper β†’ LLM β†’ MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | `python app_minimal.py` | | `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` | | `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` | | `src/api/app.py` | **FastAPI service** β€” loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` | --- ## Repository layout ``` app.py # Gradio (production, HF Spaces) app_lab.py # Gradio (experimental) requirements.txt # Spaces runtime β€” do NOT pin torch/torchaudio packages.txt # apt deps (ffmpeg) configs/ base_config.yaml # shared settings api_config.yaml # FastAPI-specific lora_bambara.yaml # Bambara LoRA hyperparams lora_fula.yaml # Fula LoRA hyperparams data/ phrases/ # RapidFuzz shortcut phrase JSONs per language vocabulary.jsonl # local mirror of the HF Hub memory dataset docs/ roadmap_2026-04.md # full architectural walkthrough + action plan baseline_rebuild.md # parallel minimal-track plan (non-destructive) notebook_collaboration.md # Kaggle push/pull workflow for contributors kaggle_mcp_setup.md # optional Kaggle MCP for Claude Desktop notebooks/ kaggle_master_trainer/ # -> oussow/kaggle-master-trainer (LoRA fine-tune) train_fula_tts/ # -> oussow/sahel-voice-fula-tts-trainer (TBD) bootstrap_repos.ipynb train_colab.ipynb # legacy Colab trainer scripts/ train_bambara.py # LoRA fine-tune entrypoint (Kaggle/RunPod) train_fula.py # LoRA fine-tune entrypoint (Kaggle/RunPod) export_onnx.py # merge LoRA -> ONNX -> TFLite verify_baseline.py # eval harness run_server.py # FastAPI launcher run_data_pipeline.py # dataset prep push_to_hf.sh # deploy helpers push_to_kaggle.sh # deploy helpers runpod_setup.sh src/ api/ # FastAPI app, schemas, routes, middleware conversation/ # memory_manager, gemma_client, phrase_matcher, intent_parser data/ # dataset loading + normalization (Adlam, Bambara) engine/ # adapter_manager, transcriber, stt_processor, curiosity iot/ # intent_parser, voice_responder, sensor_bridge llm/ # LLM client wrappers memory/ # vocabulary persistence optimization/ # ONNX / quantization helpers training/ # trainer, callbacks, augmenters tts/ # mms_tts, waxal_tts, f5_tts, voice_cloner voice/ # speaker_profiles (ECAPA-TDNN + OpenVoice SE) tests/ # pytest β€” api, data pipeline, engine, iot ``` --- ## How the memory loop works 1. Press **Push-to-Talk** β†’ speak in Bambara, Fula, French, or English. 2. **Whisper** transcribes. If the language has a LoRA adapter loaded, `AdapterManager` hot-swaps to it (~50 ms). 3. **Qwen** reads the vocabulary it has learned so far (`MemoryManager.get_vocabulary_context()`), then returns a structured JSON reply with `intent ∈ {teaching, question, conversation, error}`. 4. If `teaching`: the word pair is appended to `data/vocabulary.jsonl` and async-pushed to `ous-sow/sahel-agri-feedback β†’ vocabulary.jsonl`. 5. If `question`: Qwen answers using the remembered vocabulary as source of truth. 6. If `conversation`: Qwen replies naturally. 7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere). The last 5 learned words are always visible in the UI. --- ## How the agricultural voice interface works 1. User asks, e.g., *"A bΙ› di wa?"* ("Is it OK?") referring to their field. 2. `intent_parser.py` (keyword-based) classifies the request: `check_soil` / `check_weather` / `irrigation_status` / `pest_alert` / etc. 3. `SensorBridge` calls the configured `SENSOR_API_URL` and returns a typed `SensorData`. 4. `voice_responder.py` maps `(Intent, SensorData)` β†’ a short (≀ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (`SOIL_MOISTURE_LOW=30`, `TEMP_HIGH=38`, pH bounds). 5. TTS speaks the reply. --- ## Environment variables All variables have sensible defaults, so you can boot the Space without any of them β€” but without `HF_TOKEN` the memory loop cannot push. ### Core | Key | Default | Purpose | |-----|---------|---------| | `HF_TOKEN` | β€” | HF write token. Required for Hub push and gated models. | | `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` | Memory-loop target dataset. | | `ADAPTER_REPO_ID` | `ous-sow/sahel-agri-adapters` | Published LoRA adapters. | | `WHISPER_MODEL_ID` | `openai/whisper-large-v3-turbo` | STT base model. | | `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` | LLM via HF Serverless. Override to any HF Serverless-supported model. | | `LOG_LEVEL` | `INFO` | Standard Python logging level. | | `DEVICE` | `cuda` (FastAPI) | Torch device for inference. | ### Adapters & TTS | Key | Default | |-----|---------| | `BAMBARA_ADAPTER_PATH` | `./adapters/bambara` | | `FULA_ADAPTER_PATH` | `./adapters/fula` | | `BAMBARA_TTS_REPO` | `ynnov/ekodi-bambara-tts-female` | | `FULA_TTS_REPO` | `ous-sow/fula-tts` | ### IoT | Key | Default | |-----|---------| | `SENSOR_API_URL` | *(unset β†’ mock sensor)* | ### Self-Teaching tab (triggers Kaggle training runs) | Key | Default | |-----|---------| | `KAGGLE_USERNAME` | β€” | | `KAGGLE_KEY` | β€” | | `KAGGLE_KERNEL_SLUG` | `ous-sow/sahel-voice-master-trainer` *(override in prod to `oussow/kaggle-master-trainer` β€” the actual Kaggle owner slug)* | | `AUTO_TRAIN_THRESHOLD` | `50` | --- ## Run locally ```bash # Minimal baseline (what the Space runs) pip install -r requirements.txt python app_minimal.py # Full production UI (not currently on the Space) python app.py # FastAPI service python scripts/run_server.py # Experimental lab UI python app_lab.py ``` System-level dependency: **ffmpeg** (see `packages.txt`). --- ## Training LoRA fine-tuning runs on **Kaggle T4** or **RunPod** β€” not locally. Pick one entrypoint: | Target | Script | Notebook | |--------|--------|----------| | Bambara LoRA | `scripts/train_bambara.py` | `notebooks/kaggle_master_trainer/` | | Fula LoRA | `scripts/train_fula.py` | `notebooks/kaggle_master_trainer/` | | Fula TTS | β€” | `notebooks/train_fula_tts/` *(planned)* | **Contributor workflow:** edit notebooks locally in `notebooks//`, commit with `nbstripout` keeping diffs clean, then `cd notebooks/ && kaggle kernels push` to run on Kaggle GPU. Full walkthrough in `docs/notebook_collaboration.md`. `docs/kaggle_mcp_setup.md` documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM. --- ## Export for edge ```bash python scripts/export_onnx.py # merges LoRA into the backbone, exports ONNX # then onnx-tf β†’ TFLite for Android ``` ONNX does not support LoRA hot-swap, so export one file per language. `bitsandbytes` NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in `requirements.txt`). --- ## Tests ```bash pytest tests/ ``` Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder). --- ## Space secrets (HF UI β†’ Settings β†’ Secrets) At minimum: | Key | Value | |-----|-------| | `HF_TOKEN` | write-scope token | | `FEEDBACK_REPO_ID` | `ous-sow/sahel-agri-feedback` | | `LLM_MODEL_ID` | `CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model) | --- ## Design constraints (deliberate β€” do not change without discussion) - **Adapter hot-swap** via PEFT's multi-adapter API β€” one backbone in VRAM, ~50 MB adapters per language, `set_adapter` β‰ˆ 50 ms. - **Qwen "adult-child" JSON contract** β€” structured `intent`/`reply`/`english`/`teaching_pair` output, parsed out of optional markdown fences. - **JSONL + Hub push memory** β€” no ORM, thread-safe `MemoryManager`, async push so UI never blocks. - **≀ 6 words per sentence** in `voice_responder.py` for clean MMS-TTS. - **Adlam ↔ Latin dual-script** handling in `adlam.py` + `bam_normalize.py`. - **Single-file `app.py`** β€” intentional for now; do not split without a plan. --- ## License MIT.