Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.14.0
title: Sahel-Voice-Lab — Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.25.0
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
- bambara
- fula
- speech-recognition
- text-to-speech
- agriculture
- iot
- language-learning
- west-africa
- low-resource-nlp
- memory
🌍 Sahel-Voice-Lab
A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).
Two intertwined jobs:
- Memory loop — users teach the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
- Agricultural IoT voice interface — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.
The core stack is explicitly 100% non-Meta (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.
What this Space currently runs — the ground-zero minimal baseline
The deployed Space (app_file: app_minimal.py) is the Month 1–3 rebuild
baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field
testing and to build a real-user eval set. No LoRA adapters, no memory loop,
no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in
app.py still exists for the full production stack; it is just not what the
Space serves today.
Three stacked changes land dialect fidelity without any training:
Stage 1 — dialect-pinned system prompt (
src/llm/minimal_client.py). Replaces theGemmaClientJSON/teacher flow with a plain-text client whose system prompt pins the target dialect explicitly — Bambara as spoken in Bamako, Mali and Pular of Fuuta Jallon, as spoken in Guinea — names the languages the model must not drift into (Wolof, Hausa, Pulaar of Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair bilingual gold list as few-shot anchoring (configs/dialect_anchors/{bambara_mali,pular_guinea}.json).Stage 2 — curated phrasebook short-circuit (
src/llm/phrasebook.py). Before calling the LLM, the user's input is normalised and fuzzy-matched (threshold 0.88) against a curated English-keyed phrasebook (configs/dialect_anchors/{bambara,pular}_phrasebook.json— 100 Bambara / 110 Pular entries across greetings, family, food, farming, health, shopping, travel, clarity, time, parting). A hit returns the gold translation directly — zero LLM risk, zero latency.Stage 3 — better multilingual base LLM. Default
LLM_MODEL_IDis nowCohereLabs/aya-expanse-32b, a 23-language multilingual model with much stronger West African coverage than Qwen 2.5-7B. Can be overridden via theLLM_MODEL_IDenv var (e.g. toQwen/Qwen2.5-72B-Instruct) if Cohere's inference provider is not available on your HF account.Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot. Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
- audio) is automatic on submit (no LLM), and a separate Generate reply
button calls the dialect-anchored LLM for a conversational response. On a
phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
pairs as additional style anchoring. Every turn is appended to
data/field_turns.jsonl(src/engine/turn_logger.py) with phase, latency breakdown, phrasebook hit, and reply — the substrate for hit-rate measurement, A/B comparisons, and eventual Stage-5 LoRA training-data curation. The system prompt now also explicitly tells the LLM to reply, not translate — the few-shot pairs are framed as style/orthography references only, fixing the "the LLM just echoes the phrasebook target" regression.
- audio) is automatic on submit (no LLM), and a separate Generate reply
button calls the dialect-anchored LLM for a conversational response. On a
phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
pairs as additional style anchoring. Every turn is appended to
See docs/baseline_rebuild.md for the broader minimal-track plan.
Status
| Phase | Feature | State |
|---|---|---|
| 1 | Memory loop (JSONL + HF Hub) | ✅ shipped |
| 2 | Waxal VITS TTS — Bambara | ✅ shipped |
| 2 | Waxal VITS TTS — Fula | ⏳ placeholder until ous-sow/fula-tts is trained |
| 3 | Voice-to-voice S2S (F5-TTS + CER) | 🚧 merged, stabilizing |
| — | Adlam ↔ Latin round-trip, per-language prompts | ✅ landed |
See docs/roadmap_2026-04.md for the full plan and docs/baseline_rebuild.md for the parallel minimal-track strategy.
Stack
| Layer | Tool |
|---|---|
| STT | openai/whisper-large-v3-turbo + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) |
| LLM | CohereLabs/aya-expanse-32b (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to Qwen/Qwen2.5-72B-Instruct, Qwen2.5-7B-Instruct, Mistral, Zephyr |
| Dialect anchoring (minimal) | src/llm/minimal_client.py — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails |
| Phrasebook short-circuit (minimal) | src/llm/phrasebook.py — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call |
| TTS (baseline) | facebook/mms-tts-bam, facebook/mms-tts-ful |
| TTS (Bambara) | ynnov/ekodi-bambara-tts-female (Waxal VITS) |
| TTS (Fula) | placeholder → ous-sow/fula-tts when published |
| Voice cloning | F5-TTS + OpenVoice V2 (Phase 3, GPU-only) |
| Speaker ID | SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75 |
| Fast path | RapidFuzz over data/phrases/{lang}.json for greetings / thanks / farewells |
| Persistence | JSONL on disk + HF Hub datasets (no ORM) |
| Training | PEFT LoRA + Seq2SeqTrainer on FLEURS, Jeli-ASR, SLR 105/106 |
Three entry points (do not conflate)
| File | Purpose | Lifecycle |
|---|---|---|
app_minimal.py |
Minimal baseline Gradio UI — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | python app_minimal.py |
app.py |
Full production Gradio UI (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | python app.py |
app_lab.py |
Experimental Gradio UI for prototyping (e.g. CuriosityEngine) before folding into app.py. |
python app_lab.py |
src/api/app.py |
FastAPI service — loads Whisper once, registers bam/ful adapters via AdapterManager, preloads bam, attaches Transcriber + SensorBridge to app.state. |
python scripts/run_server.py |
Repository layout
app.py # Gradio (production, HF Spaces)
app_lab.py # Gradio (experimental)
requirements.txt # Spaces runtime — do NOT pin torch/torchaudio
packages.txt # apt deps (ffmpeg)
configs/
base_config.yaml # shared settings
api_config.yaml # FastAPI-specific
lora_bambara.yaml # Bambara LoRA hyperparams
lora_fula.yaml # Fula LoRA hyperparams
data/
phrases/ # RapidFuzz shortcut phrase JSONs per language
vocabulary.jsonl # local mirror of the HF Hub memory dataset
docs/
roadmap_2026-04.md # full architectural walkthrough + action plan
baseline_rebuild.md # parallel minimal-track plan (non-destructive)
notebook_collaboration.md # Kaggle push/pull workflow for contributors
kaggle_mcp_setup.md # optional Kaggle MCP for Claude Desktop
notebooks/
kaggle_master_trainer/ # -> oussow/kaggle-master-trainer (LoRA fine-tune)
train_fula_tts/ # -> oussow/sahel-voice-fula-tts-trainer (TBD)
bootstrap_repos.ipynb
train_colab.ipynb # legacy Colab trainer
scripts/
train_bambara.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
train_fula.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
export_onnx.py # merge LoRA -> ONNX -> TFLite
verify_baseline.py # eval harness
run_server.py # FastAPI launcher
run_data_pipeline.py # dataset prep
push_to_hf.sh # deploy helpers
push_to_kaggle.sh # deploy helpers
runpod_setup.sh
src/
api/ # FastAPI app, schemas, routes, middleware
conversation/ # memory_manager, gemma_client, phrase_matcher, intent_parser
data/ # dataset loading + normalization (Adlam, Bambara)
engine/ # adapter_manager, transcriber, stt_processor, curiosity
iot/ # intent_parser, voice_responder, sensor_bridge
llm/ # LLM client wrappers
memory/ # vocabulary persistence
optimization/ # ONNX / quantization helpers
training/ # trainer, callbacks, augmenters
tts/ # mms_tts, waxal_tts, f5_tts, voice_cloner
voice/ # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/ # pytest — api, data pipeline, engine, iot
How the memory loop works
- Press Push-to-Talk → speak in Bambara, Fula, French, or English.
- Whisper transcribes. If the language has a LoRA adapter loaded,
AdapterManagerhot-swaps to it (~50 ms). - Qwen reads the vocabulary it has learned so far (
MemoryManager.get_vocabulary_context()), then returns a structured JSON reply withintent ∈ {teaching, question, conversation, error}. - If
teaching: the word pair is appended todata/vocabulary.jsonland async-pushed toous-sow/sahel-agri-feedback → vocabulary.jsonl. - If
question: Qwen answers using the remembered vocabulary as source of truth. - If
conversation: Qwen replies naturally. - TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).
The last 5 learned words are always visible in the UI.
How the agricultural voice interface works
- User asks, e.g., "A bɛ di wa?" ("Is it OK?") referring to their field.
intent_parser.py(keyword-based) classifies the request:check_soil/check_weather/irrigation_status/pest_alert/ etc.SensorBridgecalls the configuredSENSOR_API_URLand returns a typedSensorData.voice_responder.pymaps(Intent, SensorData)→ a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (SOIL_MOISTURE_LOW=30,TEMP_HIGH=38, pH bounds).- TTS speaks the reply.
Environment variables
All variables have sensible defaults, so you can boot the Space without any of them — but without HF_TOKEN the memory loop cannot push.
Core
| Key | Default | Purpose |
|---|---|---|
HF_TOKEN |
— | HF write token. Required for Hub push and gated models. |
FEEDBACK_REPO_ID |
ous-sow/sahel-agri-feedback |
Memory-loop target dataset. |
ADAPTER_REPO_ID |
ous-sow/sahel-agri-adapters |
Published LoRA adapters. |
WHISPER_MODEL_ID |
openai/whisper-large-v3-turbo |
STT base model. |
LLM_MODEL_ID |
CohereLabs/aya-expanse-32b |
LLM via HF Serverless. Override to any HF Serverless-supported model. |
LOG_LEVEL |
INFO |
Standard Python logging level. |
DEVICE |
cuda (FastAPI) |
Torch device for inference. |
Adapters & TTS
| Key | Default |
|---|---|
BAMBARA_ADAPTER_PATH |
./adapters/bambara |
FULA_ADAPTER_PATH |
./adapters/fula |
BAMBARA_TTS_REPO |
ynnov/ekodi-bambara-tts-female |
FULA_TTS_REPO |
ous-sow/fula-tts |
IoT
| Key | Default |
|---|---|
SENSOR_API_URL |
(unset → mock sensor) |
Self-Teaching tab (triggers Kaggle training runs)
| Key | Default |
|---|---|
KAGGLE_USERNAME |
— |
KAGGLE_KEY |
— |
KAGGLE_KERNEL_SLUG |
ous-sow/sahel-voice-master-trainer (override in prod to oussow/kaggle-master-trainer — the actual Kaggle owner slug) |
AUTO_TRAIN_THRESHOLD |
50 |
Run locally
# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py
# Full production UI (not currently on the Space)
python app.py
# FastAPI service
python scripts/run_server.py
# Experimental lab UI
python app_lab.py
System-level dependency: ffmpeg (see packages.txt).
Training
LoRA fine-tuning runs on Kaggle T4 or RunPod — not locally. Pick one entrypoint:
| Target | Script | Notebook |
|---|---|---|
| Bambara LoRA | scripts/train_bambara.py |
notebooks/kaggle_master_trainer/ |
| Fula LoRA | scripts/train_fula.py |
notebooks/kaggle_master_trainer/ |
| Fula TTS | — | notebooks/train_fula_tts/ (planned) |
Contributor workflow: edit notebooks locally in notebooks/<slug>/, commit with nbstripout keeping diffs clean, then cd notebooks/<slug> && kaggle kernels push to run on Kaggle GPU. Full walkthrough in docs/notebook_collaboration.md.
docs/kaggle_mcp_setup.md documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.
Export for edge
python scripts/export_onnx.py # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android
ONNX does not support LoRA hot-swap, so export one file per language. bitsandbytes NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in requirements.txt).
Tests
pytest tests/
Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).
Space secrets (HF UI → Settings → Secrets)
At minimum:
| Key | Value |
|---|---|
HF_TOKEN |
write-scope token |
FEEDBACK_REPO_ID |
ous-sow/sahel-agri-feedback |
LLM_MODEL_ID |
CohereLabs/aya-expanse-32b (or any HF Serverless-supported model) |
Design constraints (deliberate — do not change without discussion)
- Adapter hot-swap via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language,
set_adapter≈ 50 ms. - Qwen "adult-child" JSON contract — structured
intent/reply/english/teaching_pairoutput, parsed out of optional markdown fences. - JSONL + Hub push memory — no ORM, thread-safe
MemoryManager, async push so UI never blocks. - ≤ 6 words per sentence in
voice_responder.pyfor clean MMS-TTS. - Adlam ↔ Latin dual-script handling in
adlam.py+bam_normalize.py. - Single-file
app.py— intentional for now; do not split without a plan.
License
MIT.