ground-zero / README.md
jefffffff9
Stage 4: split translate/reply UI + CPU-safe TTS + reply-not-translate prompt
9e99c2c

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Sahel-Voice-Lab  Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.25.0
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
  - bambara
  - fula
  - speech-recognition
  - text-to-speech
  - agriculture
  - iot
  - language-learning
  - west-africa
  - low-resource-nlp
  - memory

🌍 Sahel-Voice-Lab

A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).

Two intertwined jobs:

  1. Memory loop — users teach the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
  2. Agricultural IoT voice interface — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.

The core stack is explicitly 100% non-Meta (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.


What this Space currently runs — the ground-zero minimal baseline

The deployed Space (app_file: app_minimal.py) is the Month 1–3 rebuild baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field testing and to build a real-user eval set. No LoRA adapters, no memory loop, no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in app.py still exists for the full production stack; it is just not what the Space serves today.

Three stacked changes land dialect fidelity without any training:

  1. Stage 1 — dialect-pinned system prompt (src/llm/minimal_client.py). Replaces the GemmaClient JSON/teacher flow with a plain-text client whose system prompt pins the target dialect explicitly — Bambara as spoken in Bamako, Mali and Pular of Fuuta Jallon, as spoken in Guinea — names the languages the model must not drift into (Wolof, Hausa, Pulaar of Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair bilingual gold list as few-shot anchoring (configs/dialect_anchors/{bambara_mali,pular_guinea}.json).

  2. Stage 2 — curated phrasebook short-circuit (src/llm/phrasebook.py). Before calling the LLM, the user's input is normalised and fuzzy-matched (threshold 0.88) against a curated English-keyed phrasebook (configs/dialect_anchors/{bambara,pular}_phrasebook.json — 100 Bambara / 110 Pular entries across greetings, family, food, farming, health, shopping, travel, clarity, time, parting). A hit returns the gold translation directly — zero LLM risk, zero latency.

  3. Stage 3 — better multilingual base LLM. Default LLM_MODEL_ID is now CohereLabs/aya-expanse-32b, a 23-language multilingual model with much stronger West African coverage than Qwen 2.5-7B. Can be overridden via the LLM_MODEL_ID env var (e.g. to Qwen/Qwen2.5-72B-Instruct) if Cohere's inference provider is not available on your HF account.

  4. Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot. Both Voice and Text tabs use a 4-box layout: phrasebook translation (text

    • audio) is automatic on submit (no LLM), and a separate Generate reply button calls the dialect-anchored LLM for a conversational response. On a phrasebook miss the LLM is RAG-injected with the top-3 nearest curated pairs as additional style anchoring. Every turn is appended to data/field_turns.jsonl (src/engine/turn_logger.py) with phase, latency breakdown, phrasebook hit, and reply — the substrate for hit-rate measurement, A/B comparisons, and eventual Stage-5 LoRA training-data curation. The system prompt now also explicitly tells the LLM to reply, not translate — the few-shot pairs are framed as style/orthography references only, fixing the "the LLM just echoes the phrasebook target" regression.

See docs/baseline_rebuild.md for the broader minimal-track plan.


Status

Phase Feature State
1 Memory loop (JSONL + HF Hub) ✅ shipped
2 Waxal VITS TTS — Bambara ✅ shipped
2 Waxal VITS TTS — Fula ⏳ placeholder until ous-sow/fula-tts is trained
3 Voice-to-voice S2S (F5-TTS + CER) 🚧 merged, stabilizing
Adlam ↔ Latin round-trip, per-language prompts ✅ landed

See docs/roadmap_2026-04.md for the full plan and docs/baseline_rebuild.md for the parallel minimal-track strategy.


Stack

Layer Tool
STT openai/whisper-large-v3-turbo + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch)
LLM CohereLabs/aya-expanse-32b (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to Qwen/Qwen2.5-72B-Instruct, Qwen2.5-7B-Instruct, Mistral, Zephyr
Dialect anchoring (minimal) src/llm/minimal_client.py — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails
Phrasebook short-circuit (minimal) src/llm/phrasebook.py — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call
TTS (baseline) facebook/mms-tts-bam, facebook/mms-tts-ful
TTS (Bambara) ynnov/ekodi-bambara-tts-female (Waxal VITS)
TTS (Fula) placeholder → ous-sow/fula-tts when published
Voice cloning F5-TTS + OpenVoice V2 (Phase 3, GPU-only)
Speaker ID SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75
Fast path RapidFuzz over data/phrases/{lang}.json for greetings / thanks / farewells
Persistence JSONL on disk + HF Hub datasets (no ORM)
Training PEFT LoRA + Seq2SeqTrainer on FLEURS, Jeli-ASR, SLR 105/106

Three entry points (do not conflate)

File Purpose Lifecycle
app_minimal.py Minimal baseline Gradio UI — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). python app_minimal.py
app.py Full production Gradio UI (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. python app.py
app_lab.py Experimental Gradio UI for prototyping (e.g. CuriosityEngine) before folding into app.py. python app_lab.py
src/api/app.py FastAPI service — loads Whisper once, registers bam/ful adapters via AdapterManager, preloads bam, attaches Transcriber + SensorBridge to app.state. python scripts/run_server.py

Repository layout

app.py                         # Gradio (production, HF Spaces)
app_lab.py                     # Gradio (experimental)
requirements.txt               # Spaces runtime — do NOT pin torch/torchaudio
packages.txt                   # apt deps (ffmpeg)
configs/
  base_config.yaml             # shared settings
  api_config.yaml              # FastAPI-specific
  lora_bambara.yaml            # Bambara LoRA hyperparams
  lora_fula.yaml               # Fula LoRA hyperparams
data/
  phrases/                     # RapidFuzz shortcut phrase JSONs per language
  vocabulary.jsonl             # local mirror of the HF Hub memory dataset
docs/
  roadmap_2026-04.md           # full architectural walkthrough + action plan
  baseline_rebuild.md          # parallel minimal-track plan (non-destructive)
  notebook_collaboration.md    # Kaggle push/pull workflow for contributors
  kaggle_mcp_setup.md          # optional Kaggle MCP for Claude Desktop
notebooks/
  kaggle_master_trainer/       # -> oussow/kaggle-master-trainer (LoRA fine-tune)
  train_fula_tts/              # -> oussow/sahel-voice-fula-tts-trainer (TBD)
  bootstrap_repos.ipynb
  train_colab.ipynb            # legacy Colab trainer
scripts/
  train_bambara.py             # LoRA fine-tune entrypoint (Kaggle/RunPod)
  train_fula.py                # LoRA fine-tune entrypoint (Kaggle/RunPod)
  export_onnx.py               # merge LoRA -> ONNX -> TFLite
  verify_baseline.py           # eval harness
  run_server.py                # FastAPI launcher
  run_data_pipeline.py         # dataset prep
  push_to_hf.sh                # deploy helpers
  push_to_kaggle.sh            # deploy helpers
  runpod_setup.sh
src/
  api/                         # FastAPI app, schemas, routes, middleware
  conversation/                # memory_manager, gemma_client, phrase_matcher, intent_parser
  data/                        # dataset loading + normalization (Adlam, Bambara)
  engine/                      # adapter_manager, transcriber, stt_processor, curiosity
  iot/                         # intent_parser, voice_responder, sensor_bridge
  llm/                         # LLM client wrappers
  memory/                      # vocabulary persistence
  optimization/                # ONNX / quantization helpers
  training/                    # trainer, callbacks, augmenters
  tts/                         # mms_tts, waxal_tts, f5_tts, voice_cloner
  voice/                       # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/                         # pytest — api, data pipeline, engine, iot

How the memory loop works

  1. Press Push-to-Talk → speak in Bambara, Fula, French, or English.
  2. Whisper transcribes. If the language has a LoRA adapter loaded, AdapterManager hot-swaps to it (~50 ms).
  3. Qwen reads the vocabulary it has learned so far (MemoryManager.get_vocabulary_context()), then returns a structured JSON reply with intent ∈ {teaching, question, conversation, error}.
  4. If teaching: the word pair is appended to data/vocabulary.jsonl and async-pushed to ous-sow/sahel-agri-feedback → vocabulary.jsonl.
  5. If question: Qwen answers using the remembered vocabulary as source of truth.
  6. If conversation: Qwen replies naturally.
  7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).

The last 5 learned words are always visible in the UI.


How the agricultural voice interface works

  1. User asks, e.g., "A bɛ di wa?" ("Is it OK?") referring to their field.
  2. intent_parser.py (keyword-based) classifies the request: check_soil / check_weather / irrigation_status / pest_alert / etc.
  3. SensorBridge calls the configured SENSOR_API_URL and returns a typed SensorData.
  4. voice_responder.py maps (Intent, SensorData) → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (SOIL_MOISTURE_LOW=30, TEMP_HIGH=38, pH bounds).
  5. TTS speaks the reply.

Environment variables

All variables have sensible defaults, so you can boot the Space without any of them — but without HF_TOKEN the memory loop cannot push.

Core

Key Default Purpose
HF_TOKEN HF write token. Required for Hub push and gated models.
FEEDBACK_REPO_ID ous-sow/sahel-agri-feedback Memory-loop target dataset.
ADAPTER_REPO_ID ous-sow/sahel-agri-adapters Published LoRA adapters.
WHISPER_MODEL_ID openai/whisper-large-v3-turbo STT base model.
LLM_MODEL_ID CohereLabs/aya-expanse-32b LLM via HF Serverless. Override to any HF Serverless-supported model.
LOG_LEVEL INFO Standard Python logging level.
DEVICE cuda (FastAPI) Torch device for inference.

Adapters & TTS

Key Default
BAMBARA_ADAPTER_PATH ./adapters/bambara
FULA_ADAPTER_PATH ./adapters/fula
BAMBARA_TTS_REPO ynnov/ekodi-bambara-tts-female
FULA_TTS_REPO ous-sow/fula-tts

IoT

Key Default
SENSOR_API_URL (unset → mock sensor)

Self-Teaching tab (triggers Kaggle training runs)

Key Default
KAGGLE_USERNAME
KAGGLE_KEY
KAGGLE_KERNEL_SLUG ous-sow/sahel-voice-master-trainer (override in prod to oussow/kaggle-master-trainer — the actual Kaggle owner slug)
AUTO_TRAIN_THRESHOLD 50

Run locally

# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py

# Full production UI (not currently on the Space)
python app.py

# FastAPI service
python scripts/run_server.py

# Experimental lab UI
python app_lab.py

System-level dependency: ffmpeg (see packages.txt).


Training

LoRA fine-tuning runs on Kaggle T4 or RunPod — not locally. Pick one entrypoint:

Target Script Notebook
Bambara LoRA scripts/train_bambara.py notebooks/kaggle_master_trainer/
Fula LoRA scripts/train_fula.py notebooks/kaggle_master_trainer/
Fula TTS notebooks/train_fula_tts/ (planned)

Contributor workflow: edit notebooks locally in notebooks/<slug>/, commit with nbstripout keeping diffs clean, then cd notebooks/<slug> && kaggle kernels push to run on Kaggle GPU. Full walkthrough in docs/notebook_collaboration.md.

docs/kaggle_mcp_setup.md documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.


Export for edge

python scripts/export_onnx.py   # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android

ONNX does not support LoRA hot-swap, so export one file per language. bitsandbytes NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in requirements.txt).


Tests

pytest tests/

Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).


Space secrets (HF UI → Settings → Secrets)

At minimum:

Key Value
HF_TOKEN write-scope token
FEEDBACK_REPO_ID ous-sow/sahel-agri-feedback
LLM_MODEL_ID CohereLabs/aya-expanse-32b (or any HF Serverless-supported model)

Design constraints (deliberate — do not change without discussion)

  • Adapter hot-swap via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, set_adapter ≈ 50 ms.
  • Qwen "adult-child" JSON contract — structured intent/reply/english/teaching_pair output, parsed out of optional markdown fences.
  • JSONL + Hub push memory — no ORM, thread-safe MemoryManager, async push so UI never blocks.
  • ≤ 6 words per sentence in voice_responder.py for clean MMS-TTS.
  • Adlam ↔ Latin dual-script handling in adlam.py + bam_normalize.py.
  • Single-file app.py — intentional for now; do not split without a plan.

License

MIT.