Spaces:

MataStrategy
/

ground-zero

Sleeping

App Files Files Community

ground-zero / README.md

jefffffff9

Stage 4: split translate/reply UI + CPU-safe TTS + reply-not-translate prompt

9e99c2c 27 days ago

preview code

raw

history blame contribute delete

14.9 kB

	---
	title: Sahel-Voice-Lab — Minimal
	emoji: 🌍
	colorFrom: blue
	colorTo: green
	sdk: gradio
	sdk_version: "5.25.0"
	app_file: app_minimal.py
	hardware: cpu-basic
	pinned: false
	license: mit
	tags:
	- bambara
	- fula
	- speech-recognition
	- text-to-speech
	- agriculture
	- iot
	- language-learning
	- west-africa
	- low-resource-nlp
	- memory
	---

	# 🌍 Sahel-Voice-Lab

	A voice-first AI assistant for Bambara (Mali) and Fula/Pular (Guinea, Senegal).

	Two intertwined jobs:

	1. Memory loop — users teach the assistant new words; it persists them to a HuggingFace dataset and uses them as the source of truth in future answers.
	2. Agricultural IoT voice interface — Sahelian farmers query soil, weather, irrigation, and pest data in their own language, short answers, ≤ 6 words per sentence for clean TTS.

	The core stack is explicitly 100% non-Meta (Whisper / Aya-Expanse / F5-TTS / VITS); MMS-TTS is only used as a baseline fallback.

	---

	## What this Space currently runs — the `ground-zero` minimal baseline

	The deployed Space (`app_file: app_minimal.py`) is the Month 1–3 rebuild
	baseline — a stripped-down Whisper → LLM → MMS-TTS pipeline used for field
	testing and to build a real-user eval set. No LoRA adapters, no memory loop,
	no speaker ID, no voice cloning, no IoT, no phrase matcher. Everything in
	`app.py` still exists for the full production stack; it is just not what the
	Space serves today.

	Three stacked changes land dialect fidelity without any training:

	1. Stage 1 — dialect-pinned system prompt (`src/llm/minimal_client.py`).
	Replaces the `GemmaClient` JSON/teacher flow with a plain-text client whose
	system prompt pins the target dialect explicitly — *Bambara as spoken in
	Bamako, Mali* and Pular of Fuuta Jallon, as spoken in Guinea — names the
	languages the model must not drift into (Wolof, Hausa, Pulaar of
	Senegal, Fulfulde of Nigeria, Jula of Côte d'Ivoire), and injects a 30-pair
	bilingual gold list as few-shot anchoring
	(`configs/dialect_anchors/{bambara_mali,pular_guinea}.json`).

	2. Stage 2 — curated phrasebook short-circuit (`src/llm/phrasebook.py`).
	Before calling the LLM, the user's input is normalised and fuzzy-matched
	(threshold 0.88) against a curated English-keyed phrasebook
	(`configs/dialect_anchors/{bambara,pular}_phrasebook.json` — 100 Bambara /
	110 Pular entries across greetings, family, food, farming, health,
	shopping, travel, clarity, time, parting). A hit returns the gold
	translation directly — zero LLM risk, zero latency.

	3. Stage 3 — better multilingual base LLM.
	Default `LLM_MODEL_ID` is now `CohereLabs/aya-expanse-32b`, a 23-language
	multilingual model with much stronger West African coverage than Qwen
	2.5-7B. Can be overridden via the `LLM_MODEL_ID` env var (e.g. to
	`Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
	available on your HF account.

	4. Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot.
	Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
	+ audio) is automatic on submit (no LLM), and a separate Generate reply
	button calls the dialect-anchored LLM for a conversational response. On a
	phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
	pairs as additional style anchoring. Every turn is appended to
	`data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency
	breakdown, phrasebook hit, and reply — the substrate for hit-rate
	measurement, A/B comparisons, and eventual Stage-5 LoRA training-data
	curation. The system prompt now also explicitly tells the LLM to **reply,
	not translate** — the few-shot pairs are framed as style/orthography
	references only, fixing the "the LLM just echoes the phrasebook target"
	regression.

	See `docs/baseline_rebuild.md` for the broader minimal-track plan.

	---

	## Status

	\| Phase \| Feature \| State \|
	\|------:\|---------\|-------\|
	\| 1 \| Memory loop (JSONL + HF Hub) \| ✅ shipped \|
	\| 2 \| Waxal VITS TTS — Bambara \| ✅ shipped \|
	\| 2 \| Waxal VITS TTS — Fula \| ⏳ placeholder until `ous-sow/fula-tts` is trained \|
	\| 3 \| Voice-to-voice S2S (F5-TTS + CER) \| 🚧 merged, stabilizing \|
	\| — \| Adlam ↔ Latin round-trip, per-language prompts \| ✅ landed \|

	See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` for the parallel minimal-track strategy.

	---

	## Stack

	\| Layer \| Tool \|
	\|-------\|------\|
	\| STT \| `openai/whisper-large-v3-turbo` + PEFT LoRA hot-swap (~50 MB adapter per language, ~50 ms switch) \|
	\| LLM \| `CohereLabs/aya-expanse-32b` (minimal-baseline default, strong African-language coverage) via HF Serverless InferenceClient — overridable to `Qwen/Qwen2.5-72B-Instruct`, `Qwen2.5-7B-Instruct`, Mistral, Zephyr \|
	\| Dialect anchoring (minimal) \| `src/llm/minimal_client.py` — pinned Bambara-Mali / Pular-Guinea system prompt with 30-pair bilingual few-shot + forbidden-drift guardrails \|
	\| Phrasebook short-circuit (minimal) \| `src/llm/phrasebook.py` — 100 Bambara + 110 Pular curated gold pairs, fuzzy-matched (0.88 threshold) before any LLM call \|
	\| TTS (baseline) \| `facebook/mms-tts-bam`, `facebook/mms-tts-ful` \|
	\| TTS (Bambara) \| `ynnov/ekodi-bambara-tts-female` (Waxal VITS) \|
	\| TTS (Fula) \| placeholder → `ous-sow/fula-tts` when published \|
	\| Voice cloning \| F5-TTS + OpenVoice V2 (Phase 3, GPU-only) \|
	\| Speaker ID \| SpeechBrain ECAPA-TDNN, 192-d embeddings, cosine ≥ 0.75 \|
	\| Fast path \| RapidFuzz over `data/phrases/{lang}.json` for greetings / thanks / farewells \|
	\| Persistence \| JSONL on disk + HF Hub datasets (no ORM) \|
	\| Training \| PEFT LoRA + `Seq2SeqTrainer` on FLEURS, Jeli-ASR, SLR 105/106 \|

	---

	## Three entry points (do not conflate)

	\| File \| Purpose \| Lifecycle \|
	\|------\|---------\|-----------\|
	\| `app_minimal.py` \| Minimal baseline Gradio UI — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). \| `python app_minimal.py` \|
	\| `app.py` \| Full production Gradio UI (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. \| `python app.py` \|
	\| `app_lab.py` \| Experimental Gradio UI for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. \| `python app_lab.py` \|
	\| `src/api/app.py` \| FastAPI service — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. \| `python scripts/run_server.py` \|

	---

	## Repository layout

	```
	app.py # Gradio (production, HF Spaces)
	app_lab.py # Gradio (experimental)
	requirements.txt # Spaces runtime — do NOT pin torch/torchaudio
	packages.txt # apt deps (ffmpeg)
	configs/
	base_config.yaml # shared settings
	api_config.yaml # FastAPI-specific
	lora_bambara.yaml # Bambara LoRA hyperparams
	lora_fula.yaml # Fula LoRA hyperparams
	data/
	phrases/ # RapidFuzz shortcut phrase JSONs per language
	vocabulary.jsonl # local mirror of the HF Hub memory dataset
	docs/
	roadmap_2026-04.md # full architectural walkthrough + action plan
	baseline_rebuild.md # parallel minimal-track plan (non-destructive)
	notebook_collaboration.md # Kaggle push/pull workflow for contributors
	kaggle_mcp_setup.md # optional Kaggle MCP for Claude Desktop
	notebooks/
	kaggle_master_trainer/ # -> oussow/kaggle-master-trainer (LoRA fine-tune)
	train_fula_tts/ # -> oussow/sahel-voice-fula-tts-trainer (TBD)
	bootstrap_repos.ipynb
	train_colab.ipynb # legacy Colab trainer
	scripts/
	train_bambara.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
	train_fula.py # LoRA fine-tune entrypoint (Kaggle/RunPod)
	export_onnx.py # merge LoRA -> ONNX -> TFLite
	verify_baseline.py # eval harness
	run_server.py # FastAPI launcher
	run_data_pipeline.py # dataset prep
	push_to_hf.sh # deploy helpers
	push_to_kaggle.sh # deploy helpers
	runpod_setup.sh
	src/
	api/ # FastAPI app, schemas, routes, middleware
	conversation/ # memory_manager, gemma_client, phrase_matcher, intent_parser
	data/ # dataset loading + normalization (Adlam, Bambara)
	engine/ # adapter_manager, transcriber, stt_processor, curiosity
	iot/ # intent_parser, voice_responder, sensor_bridge
	llm/ # LLM client wrappers
	memory/ # vocabulary persistence
	optimization/ # ONNX / quantization helpers
	training/ # trainer, callbacks, augmenters
	tts/ # mms_tts, waxal_tts, f5_tts, voice_cloner
	voice/ # speaker_profiles (ECAPA-TDNN + OpenVoice SE)
	tests/ # pytest — api, data pipeline, engine, iot
	```

	---

	## How the memory loop works

	1. Press Push-to-Talk → speak in Bambara, Fula, French, or English.
	2. Whisper transcribes. If the language has a LoRA adapter loaded, `AdapterManager` hot-swaps to it (~50 ms).
	3. Qwen reads the vocabulary it has learned so far (`MemoryManager.get_vocabulary_context()`), then returns a structured JSON reply with `intent ∈ {teaching, question, conversation, error}`.
	4. If `teaching`: the word pair is appended to `data/vocabulary.jsonl` and async-pushed to `ous-sow/sahel-agri-feedback → vocabulary.jsonl`.
	5. If `question`: Qwen answers using the remembered vocabulary as source of truth.
	6. If `conversation`: Qwen replies naturally.
	7. TTS speaks the reply (Waxal VITS for Bambara, MMS-TTS fallback elsewhere).

	The last 5 learned words are always visible in the UI.

	---

	## How the agricultural voice interface works

	1. User asks, e.g., "A bɛ di wa?" ("Is it OK?") referring to their field.
	2. `intent_parser.py` (keyword-based) classifies the request: `check_soil` / `check_weather` / `irrigation_status` / `pest_alert` / etc.
	3. `SensorBridge` calls the configured `SENSOR_API_URL` and returns a typed `SensorData`.
	4. `voice_responder.py` maps `(Intent, SensorData)` → a short (≤ 6 words/sentence) Bambara or Fula reply + English translation. Alert thresholds are encoded here (`SOIL_MOISTURE_LOW=30`, `TEMP_HIGH=38`, pH bounds).
	5. TTS speaks the reply.

	---

	## Environment variables

	All variables have sensible defaults, so you can boot the Space without any of them — but without `HF_TOKEN` the memory loop cannot push.

	### Core
	\| Key \| Default \| Purpose \|
	\|-----\|---------\|---------\|
	\| `HF_TOKEN` \| — \| HF write token. Required for Hub push and gated models. \|
	\| `FEEDBACK_REPO_ID` \| `ous-sow/sahel-agri-feedback` \| Memory-loop target dataset. \|
	\| `ADAPTER_REPO_ID` \| `ous-sow/sahel-agri-adapters` \| Published LoRA adapters. \|
	\| `WHISPER_MODEL_ID` \| `openai/whisper-large-v3-turbo` \| STT base model. \|
	\| `LLM_MODEL_ID` \| `CohereLabs/aya-expanse-32b` \| LLM via HF Serverless. Override to any HF Serverless-supported model. \|
	\| `LOG_LEVEL` \| `INFO` \| Standard Python logging level. \|
	\| `DEVICE` \| `cuda` (FastAPI) \| Torch device for inference. \|

	### Adapters & TTS
	\| Key \| Default \|
	\|-----\|---------\|
	\| `BAMBARA_ADAPTER_PATH` \| `./adapters/bambara` \|
	\| `FULA_ADAPTER_PATH` \| `./adapters/fula` \|
	\| `BAMBARA_TTS_REPO` \| `ynnov/ekodi-bambara-tts-female` \|
	\| `FULA_TTS_REPO` \| `ous-sow/fula-tts` \|

	### IoT
	\| Key \| Default \|
	\|-----\|---------\|
	\| `SENSOR_API_URL` \| (unset → mock sensor) \|

	### Self-Teaching tab (triggers Kaggle training runs)
	\| Key \| Default \|
	\|-----\|---------\|
	\| `KAGGLE_USERNAME` \| — \|
	\| `KAGGLE_KEY` \| — \|
	\| `KAGGLE_KERNEL_SLUG` \| `ous-sow/sahel-voice-master-trainer` (override in prod to `oussow/kaggle-master-trainer` — the actual Kaggle owner slug) \|
	\| `AUTO_TRAIN_THRESHOLD` \| `50` \|

	---

	## Run locally

	```bash
	# Minimal baseline (what the Space runs)
	pip install -r requirements.txt
	python app_minimal.py

	# Full production UI (not currently on the Space)
	python app.py

	# FastAPI service
	python scripts/run_server.py

	# Experimental lab UI
	python app_lab.py
	```

	System-level dependency: ffmpeg (see `packages.txt`).

	---

	## Training

	LoRA fine-tuning runs on Kaggle T4 or RunPod — not locally. Pick one entrypoint:

	\| Target \| Script \| Notebook \|
	\|--------\|--------\|----------\|
	\| Bambara LoRA \| `scripts/train_bambara.py` \| `notebooks/kaggle_master_trainer/` \|
	\| Fula LoRA \| `scripts/train_fula.py` \| `notebooks/kaggle_master_trainer/` \|
	\| Fula TTS \| — \| `notebooks/train_fula_tts/` (planned) \|

	Contributor workflow: edit notebooks locally in `notebooks/<slug>/`, commit with `nbstripout` keeping diffs clean, then `cd notebooks/<slug> && kaggle kernels push` to run on Kaggle GPU. Full walkthrough in `docs/notebook_collaboration.md`.

	`docs/kaggle_mcp_setup.md` documents the optional Kaggle MCP for Claude Desktop if you'd rather drive Kaggle from an LLM.

	---

	## Export for edge

	```bash
	python scripts/export_onnx.py # merges LoRA into the backbone, exports ONNX
	# then onnx-tf → TFLite for Android
	```

	ONNX does not support LoRA hot-swap, so export one file per language. `bitsandbytes` NF4 / 8-bit quantization is available for GPU-constrained deploys but is a training-only dep (not in `requirements.txt`).

	---

	## Tests

	```bash
	pytest tests/
	```

	Covers: FastAPI routes, data pipeline, engine (adapter manager + transcriber), IoT (intent parser + voice responder).

	---

	## Space secrets (HF UI → Settings → Secrets)

	At minimum:

	\| Key \| Value \|
	\|-----\|-------\|
	\| `HF_TOKEN` \| write-scope token \|
	\| `FEEDBACK_REPO_ID` \| `ous-sow/sahel-agri-feedback` \|
	\| `LLM_MODEL_ID` \| `CohereLabs/aya-expanse-32b` (or any HF Serverless-supported model) \|

	---

	## Design constraints (deliberate — do not change without discussion)

	- Adapter hot-swap via PEFT's multi-adapter API — one backbone in VRAM, ~50 MB adapters per language, `set_adapter` ≈ 50 ms.
	- Qwen "adult-child" JSON contract — structured `intent`/`reply`/`english`/`teaching_pair` output, parsed out of optional markdown fences.
	- JSONL + Hub push memory — no ORM, thread-safe `MemoryManager`, async push so UI never blocks.
	- ≤ 6 words per sentence in `voice_responder.py` for clean MMS-TTS.
	- Adlam ↔ Latin dual-script handling in `adlam.py` + `bam_normalize.py`.
	- Single-file `app.py` — intentional for now; do not split without a plan.

	---

	## License

	MIT.