Spaces:

MataStrategy
/

ground-zero

Sleeping

App Files Files Community

ground-zero / README.md

jefffffff9

Stage 4: split translate/reply UI + CPU-safe TTS + reply-not-translate prompt

9e99c2c 27 days ago

preview code

raw

history blame contribute delete

14.9 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Sahel-Voice-Lab — Minimal
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.25.0
app_file: app_minimal.py
hardware: cpu-basic
pinned: false
license: mit
tags:
  - bambara
  - fula
  - speech-recognition
  - text-to-speech
  - agriculture
  - iot
  - language-learning
  - west-africa
  - low-resource-nlp
  - memory

🌍 Sahel-Voice-Lab

A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).

Two intertwined jobs:

Memory loop — users teach the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
Agricultural IoT voice interface — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.

The core stack is explicitly 100% non-Meta (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.

What this Space currently runs — the `ground-zero` minimal baseline

The deployed Space (app_file: app_minimal.py) is the Month 1–3 rebuild baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field testing and to build a real-user eval set. No LoRA adapters, no memory loop, no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in app.py still exists for the full production stack; it is just not what the Space serves today.

Three stacked changes land dialect fidelity without any training:

Stage 1 — dialect-pinned system prompt (src/llm/minimal_client.py). Replaces the GemmaClient JSON/teacher flow with a plain-text client whose system prompt pins the target dialect explicitly — Bambara as spoken in Bamako, Mali and Pular of Fuuta Jallon, as spoken in Guinea — names the languages the model must not drift into (Wolof, Hausa, Pulaar of Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair bilingual gold list as few-shot anchoring (configs/dialect_anchors/{bambara_mali,pular_guinea}.json).
Stage 2 — curated phrasebook short-circuit (src/llm/phrasebook.py). Before calling the LLM, the user's input is normalised and fuzzy-matched (threshold 0.88) against a curated English-keyed phrasebook (configs/dialect_anchors/{bambara,pular}_phrasebook.json — 100 Bambara / 110 Pular entries across greetings, family, food, farming, health, shopping, travel, clarity, time, parting). A hit returns the gold translation directly — zero LLM risk, zero latency.
Stage 3 — better multilingual base LLM. Default LLM_MODEL_ID is now CohereLabs/aya-expanse-32b, a 23-language multilingual model with much stronger West African coverage than Qwen 2.5-7B. Can be overridden via the LLM_MODEL_ID env var (e.g. to Qwen/Qwen2.5-72B-Instruct) if Cohere's inference provider is not available on your HF account.
Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot. Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
- audio) is automatic on submit (no LLM), and a separate Generate reply button calls the dialect-anchored LLM for a conversational response. On a phrasebook miss the LLM is RAG-injected with the top-3 nearest curated pairs as additional style anchoring. Every turn is appended to data/field_turns.jsonl (src/engine/turn_logger.py) with phase, latency breakdown, phrasebook hit, and reply — the substrate for hit-rate measurement, A/B comparisons, and eventual Stage-5 LoRA training-data curation. The system prompt now also explicitly tells the LLM to reply, not translate — the few-shot pairs are framed as style/orthography references only, fixing the "the LLM just echoes the phrasebook target" regression.

See docs/baseline_rebuild.md for the broader minimal-track plan.

Status

Phase	Feature	State
1	Memory loop (JSONL + HF Hub)	✅ shipped
2	Waxal VITS TTS — Bambara	✅ shipped
2	Waxal VITS TTS — Fula	⏳ placeholder until `ous-sow/fula-tts` is trained
3	Voice-to-voice S2S (F5-TTS + CER)	🚧 merged, stabilizing
—	Adlam ↔ Latin round-trip, per-language prompts	✅ landed

See docs/roadmap_2026-04.md for the full plan and docs/baseline_rebuild.md for the parallel minimal-track strategy.

Stack

Layer	Tool
STT	`openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch)
LLM	`CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr
Dialect anchoring (minimal)	`src/llm/minimal_client.py` — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails
Phrasebook short-circuit (minimal)	`src/llm/phrasebook.py` — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call
TTS (baseline)	`facebook/mms-tts-bam`, `facebook/mms-tts-ful`
TTS (Bambara)	`ynnov/ekodi-bambara-tts-female` (Waxal VITS)
TTS (Fula)	placeholder → `ous-sow/fula-tts` when published
Voice cloning	F5-TTS + OpenVoice V2 (Phase 3, GPU-only)
Speaker ID	SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75
Fast path	RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells
Persistence	JSONL on disk + HF Hub datasets (no ORM)
Training	PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106

Three entry points (do not conflate)

File	Purpose	Lifecycle
`app_minimal.py`	Minimal baseline Gradio UI — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand).	`python app_minimal.py`
`app.py`	Full production Gradio UI (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching.	`python app.py`
`app_lab.py`	Experimental Gradio UI for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`.	`python app_lab.py`
`src/api/app.py`	FastAPI service — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`.	`python scripts/run_server.py`

Repository layout

app.py                         # Gradio (production, HF Spaces)
app_lab.py                     # Gradio (experimental)
requirements.txt               # Spaces runtime — do NOT pin torch/torchaudio
packages.txt                   # apt deps (ffmpeg)
configs/
  base_config.yaml             # shared settings
  api_config.yaml              # FastAPI-specific
  lora_bambara.yaml            # Bambara LoRA hyperparams
  lora_fula.yaml               # Fula LoRA hyperparams
data/
  phrases/                     # RapidFuzz shortcut phrase JSONs per language
  vocabulary.jsonl             # local mirror of the HF Hub memory dataset
docs/
  roadmap_2026-04.md           # full architectural walkthrough + action plan
  baseline_rebuild.md          # parallel minimal-track plan (non-destructive)
  notebook_collaboration.md    # Kaggle push/pull workflow for contributors
  kaggle_mcp_setup.md          # optional Kaggle MCP for Claude Desktop
notebooks/
  kaggle_master_trainer/       # -> oussow/kaggle-master-trainer (LoRA fine-tune)
  train_fula_tts/              # -> oussow/sahel-voice-fula-tts-trainer (TBD)
  bootstrap_repos.ipynb
  train_colab.ipynb            # legacy Colab trainer
scripts/
  train_bambara.py             # LoRA fine-tune entrypoint (Kaggle/RunPod)
  train_fula.py                # LoRA fine-tune entrypoint (Kaggle/RunPod)
  export_onnx.py               # merge LoRA -> ONNX -> TFLite
  verify_baseline.py           # eval harness
  run_server.py                # FastAPI launcher
  run_data_pipeline.py         # dataset prep
  push_to_hf.sh                # deploy helpers
  push_to_kaggle.sh            # deploy helpers
  runpod_setup.sh
src/
  api/                         # FastAPI app, schemas, routes, middleware
  conversation/                # memory_manager, gemma_client, phrase_matcher, intent_parser
  data/                        # dataset loading + normalization (Adlam, Bambara)
  engine/                      # adapter_manager, transcriber, stt_processor, curiosity
  iot/                         # intent_parser, voice_responder, sensor_bridge
  llm/                         # LLM client wrappers
  memory/                      # vocabulary persistence
  optimization/                # ONNX / quantization helpers
  training/                    # trainer, callbacks, augmenters
  tts/                         # mms_tts, waxal_tts, f5_tts, voice_cloner
  voice/                       # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
tests/                         # pytest — api, data pipeline, engine, iot

How the memory loop works

Press Push-to-Talk → speak in Bambara, Fula, French, or English.
Whisper transcribes. If the language has a LoRA adapter loaded, AdapterManager hot-swaps to it (~50 ms).
Qwen reads the vocabulary it has learned so far (MemoryManager.get_vocabulary_context()), then returns a structured JSON reply with intent ∈ {teaching, question, conversation, error}.
If teaching: the word pair is appended to data/vocabulary.jsonl and async-pushed to ous-sow/sahel-agri-feedback → vocabulary.jsonl.
If question: Qwen answers using the remembered vocabulary as source of truth.
If conversation: Qwen replies naturally.
TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).

The last 5 learned words are always visible in the UI.

How the agricultural voice interface works

User asks, e.g., "A bɛ di wa?" ("Is it OK?") referring to their field.
intent_parser.py (keyword-based) classifies the request: check_soil / check_weather / irrigation_status / pest_alert / etc.
SensorBridge calls the configured SENSOR_API_URL and returns a typed SensorData.
voice_responder.py maps (Intent, SensorData) → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (SOIL_MOISTURE_LOW=30, TEMP_HIGH=38, pH bounds).
TTS speaks the reply.

Environment variables

All variables have sensible defaults, so you can boot the Space without any of them — but without HF_TOKEN the memory loop cannot push.

Core

Key	Default	Purpose
`HF_TOKEN`	—	HF write token. Required for Hub push and gated models.
`FEEDBACK_REPO_ID`	`ous-sow/sahel-agri-feedback`	Memory-loop target dataset.
`ADAPTER_REPO_ID`	`ous-sow/sahel-agri-adapters`	Published LoRA adapters.
`WHISPER_MODEL_ID`	`openai/whisper-large-v3-turbo`	STT base model.
`LLM_MODEL_ID`	`CohereLabs/aya-expanse-32b`	LLM via HF Serverless. Override to any HF Serverless-supported model.
`LOG_LEVEL`	`INFO`	Standard Python logging level.
`DEVICE`	`cuda` (FastAPI)	Torch device for inference.

Adapters & TTS

Key	Default
`BAMBARA_ADAPTER_PATH`	`./adapters/bambara`
`FULA_ADAPTER_PATH`	`./adapters/fula`
`BAMBARA_TTS_REPO`	`ynnov/ekodi-bambara-tts-female`
`FULA_TTS_REPO`	`ous-sow/fula-tts`

IoT

Key	Default
`SENSOR_API_URL`	(unset → mock sensor)

Self-Teaching tab (triggers Kaggle training runs)

Key	Default
`KAGGLE_USERNAME`	—
`KAGGLE_KEY`	—
`KAGGLE_KERNEL_SLUG`	`ous-sow/sahel-voice-master-trainer` (override in prod to `oussow/kaggle-master-trainer` — the actual Kaggle owner slug)
`AUTO_TRAIN_THRESHOLD`	`50`

Run locally

# Minimal baseline (what the Space runs)
pip install -r requirements.txt
python app_minimal.py

# Full production UI (not currently on the Space)
python app.py

# FastAPI service
python scripts/run_server.py

# Experimental lab UI
python app_lab.py

System-level dependency: ffmpeg (see packages.txt).

Training

LoRA fine-tuning runs on Kaggle T4 or RunPod — not locally. Pick one entrypoint:

Target	Script	Notebook
Bambara LoRA	`scripts/train_bambara.py`	`notebooks/kaggle_master_trainer/`
Fula LoRA	`scripts/train_fula.py`	`notebooks/kaggle_master_trainer/`
Fula TTS	—	`notebooks/train_fula_tts/` (planned)

Contributor workflow: edit notebooks locally in notebooks/<slug>/, commit with nbstripout keeping diffs clean, then cd notebooks/<slug> && kaggle kernels push to run on Kaggle GPU. Full walkthrough in docs/notebook_collaboration.md.

docs/kaggle_mcp_setup.md documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.

Export for edge

python scripts/export_onnx.py   # merges LoRA into the backbone, exports ONNX
# then onnx-tf → TFLite for Android

ONNX does not support LoRA hot-swap, so export one file per language. bitsandbytes NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in requirements.txt).

Tests

pytest tests/

Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).

Space secrets (HF UI → Settings → Secrets)

At minimum:

Key	Value
`HF_TOKEN`	write-scope token
`FEEDBACK_REPO_ID`	`ous-sow/sahel-agri-feedback`
`LLM_MODEL_ID`	`CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model)

Design constraints (deliberate — do not change without discussion)

Adapter hot-swap via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, set_adapter ≈ 50 ms.
Qwen "adult-child" JSON contract — structured intent/reply/english/teaching_pair output, parsed out of optional markdown fences.
JSONL + Hub push memory — no ORM, thread-safe MemoryManager, async push so UI never blocks.
≤ 6 words per sentence in voice_responder.py for clean MMS-TTS.
Adlam ↔ Latin dual-script handling in adlam.py + bam_normalize.py.
Single-file app.py — intentional for now; do not split without a plan.

License

MIT.