Spaces:
Sleeping
Stage 4: split translate/reply UI + CPU-safe TTS + reply-not-translate prompt
Browse files- 4-box layout on both Voice and Text tabs: phrasebook translation (text +
audio) is automatic on submit; "Generate reply" runs the dialect-anchored
LLM only when clicked.
- Shared gr.State carries the canonical input (typed text or Whisper
transcript) into the reply button so we never re-transcribe.
- Robust device resolution: probe cuda.device_count(), and have _synthesize
retry on CPU when CUDA path raises (fixes "Torch not compiled with CUDA"
on CPU-only laptops).
- System prompt now explicitly tells the LLM to REPLY conversationally and
reframes the curated few-shot pairs as style/orthography references only,
fixing the regression where the model would echo the phrasebook target
verbatim instead of replying.
- README: add Stage 4 entry + update entry-points table.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- README.md +15 -1
- app_minimal.py +274 -165
- src/llm/minimal_client.py +15 -4
|
@@ -70,6 +70,20 @@ Three stacked changes land dialect fidelity without any training:
|
|
| 70 |
`Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
|
| 71 |
available on your HF account.
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
See `docs/baseline_rebuild.md` for the broader minimal-track plan.
|
| 74 |
|
| 75 |
---
|
|
@@ -111,7 +125,7 @@ See `docs/roadmap_2026-04.md` for the full plan and `docs/baseline_rebuild.md` f
|
|
| 111 |
|
| 112 |
| File | Purpose | Lifecycle |
|
| 113 |
|------|---------|-----------|
|
| 114 |
-
| `app_minimal.py` | **Minimal baseline Gradio UI** — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit. Tabs: Voice / Text. | `python app_minimal.py` |
|
| 115 |
| `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` |
|
| 116 |
| `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` |
|
| 117 |
| `src/api/app.py` | **FastAPI service** — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` |
|
|
|
|
| 70 |
`Qwen/Qwen2.5-72B-Instruct`) if Cohere's inference provider is not
|
| 71 |
available on your HF account.
|
| 72 |
|
| 73 |
+
4. **Stage 4 — split translate / reply UI + per-turn telemetry + RAG few-shot.**
|
| 74 |
+
Both Voice and Text tabs use a 4-box layout: phrasebook translation (text
|
| 75 |
+
+ audio) is automatic on submit (no LLM), and a separate **Generate reply**
|
| 76 |
+
button calls the dialect-anchored LLM for a conversational response. On a
|
| 77 |
+
phrasebook miss the LLM is RAG-injected with the top-3 nearest curated
|
| 78 |
+
pairs as additional style anchoring. Every turn is appended to
|
| 79 |
+
`data/field_turns.jsonl` (`src/engine/turn_logger.py`) with phase, latency
|
| 80 |
+
breakdown, phrasebook hit, and reply — the substrate for hit-rate
|
| 81 |
+
measurement, A/B comparisons, and eventual Stage-5 LoRA training-data
|
| 82 |
+
curation. The system prompt now also explicitly tells the LLM to **reply,
|
| 83 |
+
not translate** — the few-shot pairs are framed as style/orthography
|
| 84 |
+
references only, fixing the "the LLM just echoes the phrasebook target"
|
| 85 |
+
regression.
|
| 86 |
+
|
| 87 |
See `docs/baseline_rebuild.md` for the broader minimal-track plan.
|
| 88 |
|
| 89 |
---
|
|
|
|
| 125 |
|
| 126 |
| File | Purpose | Lifecycle |
|
| 127 |
|------|---------|-----------|
|
| 128 |
+
| `app_minimal.py` | **Minimal baseline Gradio UI** — what the HF Space currently serves. Whisper → LLM → MMS-TTS with dialect-pinned prompts + curated phrasebook short-circuit + RAG few-shot on miss + per-turn JSONL telemetry. Tabs: Voice / Text, each with split translation (phrasebook, automatic) and reply (LLM, on demand). | `python app_minimal.py` |
|
| 129 |
| `app.py` | **Full production Gradio UI** (not currently served on the Space). Single-file (~99 KB) by design. Tabs: Conversation / Teaching / Knowledge Base / Self-Teaching. | `python app.py` |
|
| 130 |
| `app_lab.py` | **Experimental Gradio UI** for prototyping (e.g. `CuriosityEngine`) before folding into `app.py`. | `python app_lab.py` |
|
| 131 |
| `src/api/app.py` | **FastAPI service** — loads Whisper once, registers `bam`/`ful` adapters via `AdapterManager`, preloads `bam`, attaches `Transcriber` + `SensorBridge` to `app.state`. | `python scripts/run_server.py` |
|
|
@@ -79,11 +79,21 @@ _turn_logger: TurnLogger = TurnLogger()
|
|
| 79 |
|
| 80 |
|
| 81 |
def _resolve_device() -> str:
|
| 82 |
-
"""Pick 'cuda' if torch sees a GPU, else 'cpu'. DEVICE env overrides.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
import torch # lazy
|
| 84 |
if _REQUESTED_DEVICE:
|
| 85 |
return _REQUESTED_DEVICE
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
|
| 89 |
def get_backbone() -> WhisperBackbone:
|
|
@@ -167,202 +177,246 @@ def transcribe(audio_np: np.ndarray, sample_rate: int, input_lang: str) -> str:
|
|
| 167 |
return transcript
|
| 168 |
|
| 169 |
|
| 170 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 171 |
audio: Optional[Tuple[int, np.ndarray]],
|
| 172 |
input_lang: str,
|
| 173 |
output_lang: str,
|
| 174 |
-
) -> Tuple[str, str, Optional[Tuple[int, np.ndarray]]]:
|
| 175 |
-
"""
|
| 176 |
|
| 177 |
-
|
| 178 |
-
audio: (sample_rate, audio_np) from gr.Audio.
|
| 179 |
-
input_lang: language of the spoken input (drives Whisper hint + bam_normalize).
|
| 180 |
-
output_lang: language the LLM should reply in and the TTS should speak.
|
| 181 |
-
|
| 182 |
-
Returns (transcript, reply_text, reply_audio). Graceful degradation: any
|
| 183 |
-
stage failure yields a readable string and None audio instead of raising.
|
| 184 |
"""
|
| 185 |
import time
|
| 186 |
t0 = time.perf_counter()
|
| 187 |
if audio is None:
|
| 188 |
-
return "", "(no audio received)", None
|
| 189 |
-
|
| 190 |
sample_rate, audio_np = audio
|
| 191 |
if audio_np.size == 0:
|
| 192 |
-
return "", "(empty audio)", None
|
| 193 |
|
| 194 |
-
# ── 1. Transcribe ─────────────────────────────────────────────────────
|
| 195 |
t_stt = time.perf_counter()
|
| 196 |
try:
|
| 197 |
transcript = transcribe(audio_np, sample_rate, input_lang)
|
| 198 |
-
except Exception as exc: # pragma: no cover
|
| 199 |
logger.exception("Transcription failed")
|
| 200 |
_turn_logger.log(
|
| 201 |
-
|
|
|
|
| 202 |
user_text=None, transcript=None, transcribe_ms=None,
|
| 203 |
phrasebook=None, llm_model=None, llm_ms=None,
|
| 204 |
reply_text=None, tts_ms=None,
|
| 205 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 206 |
error=f"stt: {exc}",
|
| 207 |
)
|
| 208 |
-
return "", f"(STT error: {exc})", None
|
| 209 |
transcribe_ms = int((time.perf_counter() - t_stt) * 1000)
|
| 210 |
|
| 211 |
if not transcript:
|
| 212 |
_turn_logger.log(
|
| 213 |
-
|
|
|
|
| 214 |
user_text=None, transcript="", transcribe_ms=transcribe_ms,
|
| 215 |
phrasebook=None, llm_model=None, llm_ms=None,
|
| 216 |
reply_text=None, tts_ms=None,
|
| 217 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 218 |
error="no_speech",
|
| 219 |
)
|
| 220 |
-
return "", "(no speech detected)", None
|
| 221 |
-
|
| 222 |
-
# ── 2. Phrasebook → LLM (with RAG few-shot on miss) → reply ──────────
|
| 223 |
-
reply_text, hit, llm_ms = _resolve_reply(transcript, output_lang)
|
| 224 |
-
if reply_text is None:
|
| 225 |
-
_turn_logger.log(
|
| 226 |
-
tab="voice", input_lang=input_lang, output_lang=output_lang,
|
| 227 |
-
user_text=transcript, transcript=transcript,
|
| 228 |
-
transcribe_ms=transcribe_ms,
|
| 229 |
-
phrasebook=hit, llm_model=LLM_MODEL_ID, llm_ms=llm_ms,
|
| 230 |
-
reply_text=None, tts_ms=None,
|
| 231 |
-
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 232 |
-
error="llm_failed",
|
| 233 |
-
)
|
| 234 |
-
return transcript, "(LLM error)", None
|
| 235 |
-
|
| 236 |
-
# ── 3. TTS ────────────────────────────────────────────────────────────
|
| 237 |
-
t_tts = time.perf_counter()
|
| 238 |
-
tts_ms: Optional[int] = None
|
| 239 |
-
audio_out: Optional[Tuple[int, np.ndarray]] = None
|
| 240 |
-
tts_error: Optional[str] = None
|
| 241 |
-
try:
|
| 242 |
-
wav, sr = get_tts().synthesize(
|
| 243 |
-
reply_text, language=output_lang, device=_resolve_device()
|
| 244 |
-
)
|
| 245 |
-
audio_out = (sr, wav)
|
| 246 |
-
tts_ms = int((time.perf_counter() - t_tts) * 1000)
|
| 247 |
-
except Exception as exc:
|
| 248 |
-
logger.exception("TTS failed")
|
| 249 |
-
tts_error = f"tts: {exc}"
|
| 250 |
|
|
|
|
| 251 |
_turn_logger.log(
|
| 252 |
-
|
|
|
|
| 253 |
user_text=transcript, transcript=transcript,
|
| 254 |
transcribe_ms=transcribe_ms,
|
| 255 |
-
phrasebook=hit,
|
| 256 |
-
|
| 257 |
-
llm_ms=llm_ms,
|
| 258 |
-
reply_text=reply_text, tts_ms=tts_ms,
|
| 259 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 260 |
-
error=
|
| 261 |
)
|
| 262 |
-
return transcript,
|
| 263 |
|
| 264 |
|
| 265 |
-
def
|
| 266 |
-
|
| 267 |
output_lang: str,
|
| 268 |
) -> Tuple[str, Optional[Tuple[int, np.ndarray]]]:
|
| 269 |
-
"""
|
| 270 |
-
|
| 271 |
-
Args:
|
| 272 |
-
text: typed user input.
|
| 273 |
-
output_lang: language the LLM should reply in and the TTS should speak.
|
| 274 |
-
|
| 275 |
-
No input-language param — typed input is whatever the user types; the LLM
|
| 276 |
-
reads it as-is and replies in `output_lang`. Skips Whisper entirely; this
|
| 277 |
-
is the fast dev-loop path.
|
| 278 |
-
"""
|
| 279 |
import time
|
| 280 |
t0 = time.perf_counter()
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
return "(no text entered)", None
|
| 284 |
-
|
| 285 |
-
reply_text, hit, llm_ms = _resolve_reply(text, output_lang)
|
| 286 |
-
if reply_text is None:
|
| 287 |
-
_turn_logger.log(
|
| 288 |
-
tab="text", input_lang=None, output_lang=output_lang,
|
| 289 |
-
user_text=text, transcript=None, transcribe_ms=None,
|
| 290 |
-
phrasebook=hit, llm_model=LLM_MODEL_ID, llm_ms=llm_ms,
|
| 291 |
-
reply_text=None, tts_ms=None,
|
| 292 |
-
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 293 |
-
error="llm_failed",
|
| 294 |
-
)
|
| 295 |
-
return "(LLM error)", None
|
| 296 |
-
|
| 297 |
-
t_tts = time.perf_counter()
|
| 298 |
-
tts_ms: Optional[int] = None
|
| 299 |
-
audio_out: Optional[Tuple[int, np.ndarray]] = None
|
| 300 |
-
tts_error: Optional[str] = None
|
| 301 |
-
try:
|
| 302 |
-
wav, sr = get_tts().synthesize(
|
| 303 |
-
reply_text, language=output_lang, device=_resolve_device()
|
| 304 |
-
)
|
| 305 |
-
audio_out = (sr, wav)
|
| 306 |
-
tts_ms = int((time.perf_counter() - t_tts) * 1000)
|
| 307 |
-
except Exception as exc:
|
| 308 |
-
logger.exception("TTS failed")
|
| 309 |
-
tts_error = f"tts: {exc}"
|
| 310 |
|
|
|
|
|
|
|
|
|
|
| 311 |
_turn_logger.log(
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
llm_ms=llm_ms,
|
| 317 |
-
reply_text=
|
| 318 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 319 |
-
error=
|
| 320 |
)
|
| 321 |
-
return
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
def _resolve_reply(
|
| 325 |
-
user_text: str,
|
| 326 |
-
output_lang: str,
|
| 327 |
-
) -> Tuple[Optional[str], Optional[dict], Optional[int]]:
|
| 328 |
-
"""Shared phrasebook → LLM resolver for both voice and text tabs.
|
| 329 |
-
|
| 330 |
-
Returns (reply_text, phrasebook_hit_or_None, llm_ms_or_None).
|
| 331 |
-
`reply_text` is None only if the LLM itself failed; in every other case
|
| 332 |
-
the caller is given a usable string (possibly an "(empty reply)" sentinel).
|
| 333 |
-
|
| 334 |
-
On phrasebook miss for bam/ful targets, the top-3 nearest gold pairs are
|
| 335 |
-
injected into the LLM system prompt as additional dynamic few-shot
|
| 336 |
-
(RAG-style anchoring). Misses on en/fr targets call the LLM with no
|
| 337 |
-
extras since the curated phrasebooks only cover bam/ful.
|
| 338 |
-
"""
|
| 339 |
-
import time
|
| 340 |
-
hit = phrasebook_lookup(user_text, output_lang)
|
| 341 |
-
if hit:
|
| 342 |
-
logger.info(
|
| 343 |
-
"Phrasebook hit (%s, score=%.2f): %r → %r [cat=%s]",
|
| 344 |
-
hit["match"], hit["score"], user_text, hit["target"], hit["category"],
|
| 345 |
-
)
|
| 346 |
-
reply = hit["target"] or "(empty reply)"
|
| 347 |
-
return reply, hit, None
|
| 348 |
-
|
| 349 |
-
extras = phrasebook_top_k(user_text, output_lang, k=3) or None
|
| 350 |
-
if extras:
|
| 351 |
-
logger.info(
|
| 352 |
-
"Phrasebook miss; RAG-injecting top-%d nearest (top score=%.2f)",
|
| 353 |
-
len(extras), extras[0]["score"],
|
| 354 |
-
)
|
| 355 |
-
|
| 356 |
-
t_llm = time.perf_counter()
|
| 357 |
-
try:
|
| 358 |
-
reply = get_llm().chat(
|
| 359 |
-
user_text, target_lang=output_lang, extra_examples=extras,
|
| 360 |
-
)
|
| 361 |
-
except Exception as exc: # pragma: no cover
|
| 362 |
-
logger.exception("LLM call failed")
|
| 363 |
-
return None, None, int((time.perf_counter() - t_llm) * 1000)
|
| 364 |
-
llm_ms = int((time.perf_counter() - t_llm) * 1000)
|
| 365 |
-
return (reply or "(empty reply)"), None, llm_ms
|
| 366 |
|
| 367 |
|
| 368 |
# ── Gradio UI ────────────────────────────────────────────────────────────────
|
|
@@ -393,9 +447,14 @@ def build_ui():
|
|
| 393 |
info="Language the LLM should reply in. Also picks the TTS voice.",
|
| 394 |
)
|
| 395 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 396 |
with gr.Tabs():
|
| 397 |
# ── Voice tab — the actual baseline the field test measures ─────
|
| 398 |
-
with gr.Tab("🎤 Voice (full STT →
|
| 399 |
with gr.Row():
|
| 400 |
with gr.Column():
|
| 401 |
audio_in = gr.Audio(
|
|
@@ -404,55 +463,100 @@ def build_ui():
|
|
| 404 |
label="Speak (or upload a .wav)",
|
| 405 |
)
|
| 406 |
voice_submit = gr.Button(
|
| 407 |
-
"Transcribe +
|
| 408 |
)
|
| 409 |
-
|
| 410 |
-
transcript_out = gr.Textbox(
|
| 411 |
label="Transcript (zero-shot Whisper)",
|
| 412 |
lines=2, interactive=False,
|
| 413 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 414 |
voice_reply_out = gr.Textbox(
|
| 415 |
label="LLM reply", lines=4, interactive=False,
|
| 416 |
)
|
| 417 |
-
|
| 418 |
label="Reply audio", type="numpy", autoplay=False,
|
| 419 |
)
|
| 420 |
|
| 421 |
voice_submit.click(
|
| 422 |
-
fn=
|
| 423 |
inputs=[audio_in, input_lang, output_lang],
|
| 424 |
-
outputs=[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 425 |
)
|
| 426 |
|
| 427 |
# ── Text tab — dev loop, skips Whisper ──────────────────────────
|
| 428 |
-
with gr.Tab("⌨️ Text (
|
| 429 |
with gr.Row():
|
| 430 |
with gr.Column():
|
| 431 |
text_in = gr.Textbox(
|
| 432 |
label="Type your message",
|
| 433 |
lines=3,
|
| 434 |
-
placeholder="e.g.
|
| 435 |
)
|
| 436 |
text_submit = gr.Button("Send", variant="primary")
|
| 437 |
with gr.Column():
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 438 |
text_reply_out = gr.Textbox(
|
| 439 |
label="LLM reply", lines=4, interactive=False,
|
| 440 |
)
|
| 441 |
-
|
| 442 |
label="Reply audio", type="numpy", autoplay=False,
|
| 443 |
)
|
| 444 |
|
| 445 |
# Text tab only uses output_lang — input_lang is a no-op here.
|
| 446 |
text_submit.click(
|
| 447 |
-
fn=
|
| 448 |
inputs=[text_in, output_lang],
|
| 449 |
-
outputs=[
|
|
|
|
|
|
|
|
|
|
|
|
|
| 450 |
)
|
| 451 |
# Pressing Enter in the textbox also submits.
|
| 452 |
text_in.submit(
|
| 453 |
-
fn=
|
| 454 |
inputs=[text_in, output_lang],
|
| 455 |
-
outputs=[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 456 |
)
|
| 457 |
|
| 458 |
gr.Markdown(
|
|
@@ -463,7 +567,12 @@ def build_ui():
|
|
| 463 |
"stripped-down baseline used to measure what Whisper zero-shot does on "
|
| 464 |
"real Bambara/Fula recordings and to collect a real-user eval set.\n\n"
|
| 465 |
"The **Text** tab skips Whisper — it's for fast iteration on the "
|
| 466 |
-
"LLM + TTS path, not for field-test measurement."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 467 |
)
|
| 468 |
|
| 469 |
return demo
|
|
|
|
| 79 |
|
| 80 |
|
| 81 |
def _resolve_device() -> str:
|
| 82 |
+
"""Pick 'cuda' if torch sees a GPU, else 'cpu'. DEVICE env overrides.
|
| 83 |
+
|
| 84 |
+
Some torch builds (CPU-only wheels) report `cuda.is_available() == True`
|
| 85 |
+
in error states; we additionally probe device_count and fall back to cpu
|
| 86 |
+
on any exception to keep the app usable on CPU-only laptops.
|
| 87 |
+
"""
|
| 88 |
import torch # lazy
|
| 89 |
if _REQUESTED_DEVICE:
|
| 90 |
return _REQUESTED_DEVICE
|
| 91 |
+
try:
|
| 92 |
+
if torch.cuda.is_available() and torch.cuda.device_count() > 0:
|
| 93 |
+
return "cuda"
|
| 94 |
+
except Exception:
|
| 95 |
+
pass
|
| 96 |
+
return "cpu"
|
| 97 |
|
| 98 |
|
| 99 |
def get_backbone() -> WhisperBackbone:
|
|
|
|
| 177 |
return transcript
|
| 178 |
|
| 179 |
|
| 180 |
+
NO_TRANSLATION = "(no curated translation — try Generate reply)"
|
| 181 |
+
|
| 182 |
+
|
| 183 |
+
def _synthesize(text: str, output_lang: str
|
| 184 |
+
) -> Tuple[Optional[Tuple[int, np.ndarray]], Optional[int], Optional[str]]:
|
| 185 |
+
"""Run TTS on `text` in `output_lang`. Returns (audio_or_None, tts_ms, error)."""
|
| 186 |
+
import time
|
| 187 |
+
if not text:
|
| 188 |
+
return None, None, None
|
| 189 |
+
t = time.perf_counter()
|
| 190 |
+
device = _resolve_device()
|
| 191 |
+
try:
|
| 192 |
+
wav, sr = get_tts().synthesize(text, language=output_lang, device=device)
|
| 193 |
+
return (sr, wav), int((time.perf_counter() - t) * 1000), None
|
| 194 |
+
except AssertionError as exc:
|
| 195 |
+
# Most common: "Torch not compiled with CUDA enabled" on CPU-only boxes
|
| 196 |
+
# where is_available() lied. Retry once on CPU.
|
| 197 |
+
if device != "cpu":
|
| 198 |
+
logger.warning("TTS failed on %s (%s) — retrying on cpu", device, exc)
|
| 199 |
+
try:
|
| 200 |
+
wav, sr = get_tts().synthesize(text, language=output_lang, device="cpu")
|
| 201 |
+
return (sr, wav), int((time.perf_counter() - t) * 1000), None
|
| 202 |
+
except Exception as exc2: # pragma: no cover
|
| 203 |
+
logger.exception("TTS failed on cpu fallback")
|
| 204 |
+
return None, None, f"tts: {exc2}"
|
| 205 |
+
logger.exception("TTS failed")
|
| 206 |
+
return None, None, f"tts: {exc}"
|
| 207 |
+
except Exception as exc: # pragma: no cover
|
| 208 |
+
logger.exception("TTS failed")
|
| 209 |
+
return None, None, f"tts: {exc}"
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
def _translate_only(user_text: str, output_lang: str
|
| 213 |
+
) -> Tuple[str, Optional[Tuple[int, np.ndarray]], Optional[dict], Optional[int]]:
|
| 214 |
+
"""Phrasebook-only translation — never calls the LLM.
|
| 215 |
+
|
| 216 |
+
Returns (translation_text, translation_audio, hit_or_None, tts_ms).
|
| 217 |
+
On miss for bam/ful, returns NO_TRANSLATION and no audio.
|
| 218 |
+
For en/fr targets (no curated phrasebook), echoes the input as the
|
| 219 |
+
translation since the user likely wants to hear it spoken — TTS in that
|
| 220 |
+
language is still the right thing to play.
|
| 221 |
+
"""
|
| 222 |
+
text = (user_text or "").strip()
|
| 223 |
+
if not text:
|
| 224 |
+
return "", None, None, None
|
| 225 |
+
|
| 226 |
+
hit = phrasebook_lookup(text, output_lang)
|
| 227 |
+
if hit:
|
| 228 |
+
logger.info(
|
| 229 |
+
"Phrasebook hit (%s, score=%.2f): %r → %r [cat=%s]",
|
| 230 |
+
hit["match"], hit["score"], text, hit["target"], hit["category"],
|
| 231 |
+
)
|
| 232 |
+
target = hit["target"] or ""
|
| 233 |
+
audio, tts_ms, _ = _synthesize(target, output_lang)
|
| 234 |
+
return target, audio, hit, tts_ms
|
| 235 |
+
|
| 236 |
+
# No curated translation. For en/fr we still synthesize the input itself
|
| 237 |
+
# (the user can use the app as a TTS box). For bam/ful we surface the
|
| 238 |
+
# honest "no curated translation" sentinel — the user can then click
|
| 239 |
+
# "Generate reply" if they want the LLM to handle it.
|
| 240 |
+
if output_lang in ("en", "fr"):
|
| 241 |
+
audio, tts_ms, _ = _synthesize(text, output_lang)
|
| 242 |
+
return text, audio, None, tts_ms
|
| 243 |
+
return NO_TRANSLATION, None, None, None
|
| 244 |
+
|
| 245 |
+
|
| 246 |
+
def _generate_reply(user_text: str, output_lang: str
|
| 247 |
+
) -> Tuple[str, Optional[Tuple[int, np.ndarray]], Optional[int], Optional[int], Optional[str]]:
|
| 248 |
+
"""Dialect-anchored LLM reply (with RAG top-3 few-shot) + TTS.
|
| 249 |
+
|
| 250 |
+
Returns (reply_text, reply_audio, llm_ms, tts_ms, error).
|
| 251 |
+
Always returns a usable text string — even on LLM failure it returns a
|
| 252 |
+
short parenthetical so the UI never goes blank.
|
| 253 |
+
"""
|
| 254 |
+
import time
|
| 255 |
+
text = (user_text or "").strip()
|
| 256 |
+
if not text:
|
| 257 |
+
return "(nothing to reply to)", None, None, None, None
|
| 258 |
+
|
| 259 |
+
extras = phrasebook_top_k(text, output_lang, k=3) or None
|
| 260 |
+
if extras:
|
| 261 |
+
logger.info(
|
| 262 |
+
"RAG-injecting top-%d nearest phrasebook entries (top score=%.2f)",
|
| 263 |
+
len(extras), extras[0]["score"],
|
| 264 |
+
)
|
| 265 |
+
|
| 266 |
+
t_llm = time.perf_counter()
|
| 267 |
+
try:
|
| 268 |
+
reply = get_llm().chat(
|
| 269 |
+
text, target_lang=output_lang, extra_examples=extras,
|
| 270 |
+
)
|
| 271 |
+
except Exception as exc: # pragma: no cover
|
| 272 |
+
logger.exception("LLM call failed")
|
| 273 |
+
llm_ms = int((time.perf_counter() - t_llm) * 1000)
|
| 274 |
+
return f"(LLM error: {exc})", None, llm_ms, None, f"llm: {exc}"
|
| 275 |
+
llm_ms = int((time.perf_counter() - t_llm) * 1000)
|
| 276 |
+
reply = (reply or "").strip() or "(empty reply)"
|
| 277 |
+
audio, tts_ms, tts_error = _synthesize(reply, output_lang)
|
| 278 |
+
return reply, audio, llm_ms, tts_ms, tts_error
|
| 279 |
+
|
| 280 |
+
|
| 281 |
+
# ── Tab handlers ─────────────────────────────────────────────────────────────
|
| 282 |
+
def run_text_translate(
|
| 283 |
+
text: str,
|
| 284 |
+
output_lang: str,
|
| 285 |
+
) -> Tuple[str, Optional[Tuple[int, np.ndarray]], str]:
|
| 286 |
+
"""Text tab → Send: phrasebook-only translation. Always-on, no LLM.
|
| 287 |
+
|
| 288 |
+
Returns (translation_text, translation_audio, transcript_state).
|
| 289 |
+
`transcript_state` is the canonicalised input passed to the Generate-reply
|
| 290 |
+
button so it doesn't need to re-read the textbox.
|
| 291 |
+
"""
|
| 292 |
+
import time
|
| 293 |
+
t0 = time.perf_counter()
|
| 294 |
+
text = (text or "").strip()
|
| 295 |
+
if not text:
|
| 296 |
+
return "(no text entered)", None, ""
|
| 297 |
+
|
| 298 |
+
translation, audio, hit, tts_ms = _translate_only(text, output_lang)
|
| 299 |
+
_turn_logger.log(
|
| 300 |
+
phase="translate", tab="text",
|
| 301 |
+
input_lang=None, output_lang=output_lang,
|
| 302 |
+
user_text=text, transcript=None, transcribe_ms=None,
|
| 303 |
+
phrasebook=hit, llm_model=None, llm_ms=None,
|
| 304 |
+
reply_text=translation, tts_ms=tts_ms,
|
| 305 |
+
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 306 |
+
error=None,
|
| 307 |
+
)
|
| 308 |
+
return translation, audio, text
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
def run_text_reply(
|
| 312 |
+
transcript_state: str,
|
| 313 |
+
output_lang: str,
|
| 314 |
+
) -> Tuple[str, Optional[Tuple[int, np.ndarray]]]:
|
| 315 |
+
"""Text tab → Generate reply: dialect-anchored LLM + TTS."""
|
| 316 |
+
import time
|
| 317 |
+
t0 = time.perf_counter()
|
| 318 |
+
if not (transcript_state or "").strip():
|
| 319 |
+
return "(send a message first)", None
|
| 320 |
+
|
| 321 |
+
reply, audio, llm_ms, tts_ms, error = _generate_reply(
|
| 322 |
+
transcript_state, output_lang
|
| 323 |
+
)
|
| 324 |
+
_turn_logger.log(
|
| 325 |
+
phase="reply", tab="text",
|
| 326 |
+
input_lang=None, output_lang=output_lang,
|
| 327 |
+
user_text=transcript_state, transcript=None, transcribe_ms=None,
|
| 328 |
+
phrasebook=None, llm_model=LLM_MODEL_ID, llm_ms=llm_ms,
|
| 329 |
+
reply_text=reply, tts_ms=tts_ms,
|
| 330 |
+
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 331 |
+
error=error,
|
| 332 |
+
)
|
| 333 |
+
return reply, audio
|
| 334 |
+
|
| 335 |
+
|
| 336 |
+
def run_voice_translate(
|
| 337 |
audio: Optional[Tuple[int, np.ndarray]],
|
| 338 |
input_lang: str,
|
| 339 |
output_lang: str,
|
| 340 |
+
) -> Tuple[str, str, Optional[Tuple[int, np.ndarray]], str]:
|
| 341 |
+
"""Voice tab → Submit: Whisper transcribe + phrasebook-only translation.
|
| 342 |
|
| 343 |
+
Returns (transcript, translation_text, translation_audio, transcript_state).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 344 |
"""
|
| 345 |
import time
|
| 346 |
t0 = time.perf_counter()
|
| 347 |
if audio is None:
|
| 348 |
+
return "", "(no audio received)", None, ""
|
|
|
|
| 349 |
sample_rate, audio_np = audio
|
| 350 |
if audio_np.size == 0:
|
| 351 |
+
return "", "(empty audio)", None, ""
|
| 352 |
|
|
|
|
| 353 |
t_stt = time.perf_counter()
|
| 354 |
try:
|
| 355 |
transcript = transcribe(audio_np, sample_rate, input_lang)
|
| 356 |
+
except Exception as exc: # pragma: no cover
|
| 357 |
logger.exception("Transcription failed")
|
| 358 |
_turn_logger.log(
|
| 359 |
+
phase="translate", tab="voice",
|
| 360 |
+
input_lang=input_lang, output_lang=output_lang,
|
| 361 |
user_text=None, transcript=None, transcribe_ms=None,
|
| 362 |
phrasebook=None, llm_model=None, llm_ms=None,
|
| 363 |
reply_text=None, tts_ms=None,
|
| 364 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 365 |
error=f"stt: {exc}",
|
| 366 |
)
|
| 367 |
+
return "", f"(STT error: {exc})", None, ""
|
| 368 |
transcribe_ms = int((time.perf_counter() - t_stt) * 1000)
|
| 369 |
|
| 370 |
if not transcript:
|
| 371 |
_turn_logger.log(
|
| 372 |
+
phase="translate", tab="voice",
|
| 373 |
+
input_lang=input_lang, output_lang=output_lang,
|
| 374 |
user_text=None, transcript="", transcribe_ms=transcribe_ms,
|
| 375 |
phrasebook=None, llm_model=None, llm_ms=None,
|
| 376 |
reply_text=None, tts_ms=None,
|
| 377 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 378 |
error="no_speech",
|
| 379 |
)
|
| 380 |
+
return "", "(no speech detected)", None, ""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 381 |
|
| 382 |
+
translation, t_audio, hit, tts_ms = _translate_only(transcript, output_lang)
|
| 383 |
_turn_logger.log(
|
| 384 |
+
phase="translate", tab="voice",
|
| 385 |
+
input_lang=input_lang, output_lang=output_lang,
|
| 386 |
user_text=transcript, transcript=transcript,
|
| 387 |
transcribe_ms=transcribe_ms,
|
| 388 |
+
phrasebook=hit, llm_model=None, llm_ms=None,
|
| 389 |
+
reply_text=translation, tts_ms=tts_ms,
|
|
|
|
|
|
|
| 390 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 391 |
+
error=None,
|
| 392 |
)
|
| 393 |
+
return transcript, translation, t_audio, transcript
|
| 394 |
|
| 395 |
|
| 396 |
+
def run_voice_reply(
|
| 397 |
+
transcript_state: str,
|
| 398 |
output_lang: str,
|
| 399 |
) -> Tuple[str, Optional[Tuple[int, np.ndarray]]]:
|
| 400 |
+
"""Voice tab → Generate reply: uses the stored transcript, no re-Whisper."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 401 |
import time
|
| 402 |
t0 = time.perf_counter()
|
| 403 |
+
if not (transcript_state or "").strip():
|
| 404 |
+
return "(record audio and submit first)", None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 405 |
|
| 406 |
+
reply, audio, llm_ms, tts_ms, error = _generate_reply(
|
| 407 |
+
transcript_state, output_lang
|
| 408 |
+
)
|
| 409 |
_turn_logger.log(
|
| 410 |
+
phase="reply", tab="voice",
|
| 411 |
+
input_lang=None, output_lang=output_lang,
|
| 412 |
+
user_text=transcript_state, transcript=transcript_state,
|
| 413 |
+
transcribe_ms=None,
|
| 414 |
+
phrasebook=None, llm_model=LLM_MODEL_ID, llm_ms=llm_ms,
|
| 415 |
+
reply_text=reply, tts_ms=tts_ms,
|
| 416 |
total_ms=int((time.perf_counter() - t0) * 1000),
|
| 417 |
+
error=error,
|
| 418 |
)
|
| 419 |
+
return reply, audio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 420 |
|
| 421 |
|
| 422 |
# ── Gradio UI ────────────────────────────────────────────────────────────────
|
|
|
|
| 447 |
info="Language the LLM should reply in. Also picks the TTS voice.",
|
| 448 |
)
|
| 449 |
|
| 450 |
+
# Carries the canonical input (typed text, or Whisper transcript) from
|
| 451 |
+
# Submit/Send into the Generate-reply button so we don't re-transcribe
|
| 452 |
+
# or re-read the textbox.
|
| 453 |
+
transcript_state = gr.State("")
|
| 454 |
+
|
| 455 |
with gr.Tabs():
|
| 456 |
# ── Voice tab — the actual baseline the field test measures ─────
|
| 457 |
+
with gr.Tab("🎤 Voice (full STT → translation + optional reply)"):
|
| 458 |
with gr.Row():
|
| 459 |
with gr.Column():
|
| 460 |
audio_in = gr.Audio(
|
|
|
|
| 463 |
label="Speak (or upload a .wav)",
|
| 464 |
)
|
| 465 |
voice_submit = gr.Button(
|
| 466 |
+
"Transcribe + translate", variant="primary"
|
| 467 |
)
|
| 468 |
+
voice_transcript_out = gr.Textbox(
|
|
|
|
| 469 |
label="Transcript (zero-shot Whisper)",
|
| 470 |
lines=2, interactive=False,
|
| 471 |
)
|
| 472 |
+
with gr.Column():
|
| 473 |
+
voice_translation_out = gr.Textbox(
|
| 474 |
+
label="Phrasebook translation",
|
| 475 |
+
lines=3, interactive=False,
|
| 476 |
+
)
|
| 477 |
+
voice_translation_audio = gr.Audio(
|
| 478 |
+
label="Translation audio",
|
| 479 |
+
type="numpy", autoplay=False,
|
| 480 |
+
)
|
| 481 |
+
voice_reply_btn = gr.Button(
|
| 482 |
+
"Generate reply (LLM)", variant="secondary"
|
| 483 |
+
)
|
| 484 |
voice_reply_out = gr.Textbox(
|
| 485 |
label="LLM reply", lines=4, interactive=False,
|
| 486 |
)
|
| 487 |
+
voice_reply_audio = gr.Audio(
|
| 488 |
label="Reply audio", type="numpy", autoplay=False,
|
| 489 |
)
|
| 490 |
|
| 491 |
voice_submit.click(
|
| 492 |
+
fn=run_voice_translate,
|
| 493 |
inputs=[audio_in, input_lang, output_lang],
|
| 494 |
+
outputs=[
|
| 495 |
+
voice_transcript_out,
|
| 496 |
+
voice_translation_out,
|
| 497 |
+
voice_translation_audio,
|
| 498 |
+
transcript_state,
|
| 499 |
+
],
|
| 500 |
+
)
|
| 501 |
+
voice_reply_btn.click(
|
| 502 |
+
fn=run_voice_reply,
|
| 503 |
+
inputs=[transcript_state, output_lang],
|
| 504 |
+
outputs=[voice_reply_out, voice_reply_audio],
|
| 505 |
)
|
| 506 |
|
| 507 |
# ── Text tab — dev loop, skips Whisper ──────────────────────────
|
| 508 |
+
with gr.Tab("⌨️ Text (translation + optional reply, dev loop)"):
|
| 509 |
with gr.Row():
|
| 510 |
with gr.Column():
|
| 511 |
text_in = gr.Textbox(
|
| 512 |
label="Type your message",
|
| 513 |
lines=3,
|
| 514 |
+
placeholder="e.g. Good morning, how are you?",
|
| 515 |
)
|
| 516 |
text_submit = gr.Button("Send", variant="primary")
|
| 517 |
with gr.Column():
|
| 518 |
+
text_translation_out = gr.Textbox(
|
| 519 |
+
label="Phrasebook translation",
|
| 520 |
+
lines=3, interactive=False,
|
| 521 |
+
)
|
| 522 |
+
text_translation_audio = gr.Audio(
|
| 523 |
+
label="Translation audio",
|
| 524 |
+
type="numpy", autoplay=False,
|
| 525 |
+
)
|
| 526 |
+
text_reply_btn = gr.Button(
|
| 527 |
+
"Generate reply (LLM)", variant="secondary"
|
| 528 |
+
)
|
| 529 |
text_reply_out = gr.Textbox(
|
| 530 |
label="LLM reply", lines=4, interactive=False,
|
| 531 |
)
|
| 532 |
+
text_reply_audio = gr.Audio(
|
| 533 |
label="Reply audio", type="numpy", autoplay=False,
|
| 534 |
)
|
| 535 |
|
| 536 |
# Text tab only uses output_lang — input_lang is a no-op here.
|
| 537 |
text_submit.click(
|
| 538 |
+
fn=run_text_translate,
|
| 539 |
inputs=[text_in, output_lang],
|
| 540 |
+
outputs=[
|
| 541 |
+
text_translation_out,
|
| 542 |
+
text_translation_audio,
|
| 543 |
+
transcript_state,
|
| 544 |
+
],
|
| 545 |
)
|
| 546 |
# Pressing Enter in the textbox also submits.
|
| 547 |
text_in.submit(
|
| 548 |
+
fn=run_text_translate,
|
| 549 |
inputs=[text_in, output_lang],
|
| 550 |
+
outputs=[
|
| 551 |
+
text_translation_out,
|
| 552 |
+
text_translation_audio,
|
| 553 |
+
transcript_state,
|
| 554 |
+
],
|
| 555 |
+
)
|
| 556 |
+
text_reply_btn.click(
|
| 557 |
+
fn=run_text_reply,
|
| 558 |
+
inputs=[transcript_state, output_lang],
|
| 559 |
+
outputs=[text_reply_out, text_reply_audio],
|
| 560 |
)
|
| 561 |
|
| 562 |
gr.Markdown(
|
|
|
|
| 567 |
"stripped-down baseline used to measure what Whisper zero-shot does on "
|
| 568 |
"real Bambara/Fula recordings and to collect a real-user eval set.\n\n"
|
| 569 |
"The **Text** tab skips Whisper — it's for fast iteration on the "
|
| 570 |
+
"LLM + TTS path, not for field-test measurement.\n\n"
|
| 571 |
+
"**How the two boxes differ:** the top pair is a phrasebook lookup "
|
| 572 |
+
"(no LLM, instant, gold-curated translation). If your input isn't "
|
| 573 |
+
"in the curated list you'll see *(no curated translation)* — click "
|
| 574 |
+
"**Generate reply** to get a dialect-anchored LLM response in the "
|
| 575 |
+
"bottom pair."
|
| 576 |
)
|
| 577 |
|
| 578 |
return demo
|
|
@@ -94,6 +94,12 @@ def _build_system_prompt(
|
|
| 94 |
lines: list[str] = [
|
| 95 |
f"You are a warm, concise conversational assistant that replies ONLY in {full}.",
|
| 96 |
"",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 97 |
"Output format: plain natural text only. No JSON, no code fences, no "
|
| 98 |
"markdown, no translations, no romanisation, no explanations. Reply in "
|
| 99 |
"1–3 short sentences suitable to be read aloud by a text-to-speech voice.",
|
|
@@ -114,8 +120,12 @@ def _build_system_prompt(
|
|
| 114 |
if anchors:
|
| 115 |
lines += [
|
| 116 |
"",
|
| 117 |
-
f"Reference phrases in {full} —
|
| 118 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
]
|
| 120 |
for item in anchors:
|
| 121 |
src = item.get("source", "").strip()
|
|
@@ -127,8 +137,9 @@ def _build_system_prompt(
|
|
| 127 |
lines += [
|
| 128 |
"",
|
| 129 |
"Additional reference phrases relevant to the current user input "
|
| 130 |
-
f"(curated gold {full} translations —
|
| 131 |
-
"
|
|
|
|
| 132 |
]
|
| 133 |
for item in extra_examples:
|
| 134 |
src = (item.get("source") or "").strip()
|
|
|
|
| 94 |
lines: list[str] = [
|
| 95 |
f"You are a warm, concise conversational assistant that replies ONLY in {full}.",
|
| 96 |
"",
|
| 97 |
+
"Your task is to REPLY to the user's message as a person would in "
|
| 98 |
+
"conversation — NOT to translate it. If the user greets you, greet them "
|
| 99 |
+
"back and ask how they are. If they ask a question, answer it. If they "
|
| 100 |
+
"make a statement, respond appropriately. Never simply repeat or "
|
| 101 |
+
"translate what they said back to them.",
|
| 102 |
+
"",
|
| 103 |
"Output format: plain natural text only. No JSON, no code fences, no "
|
| 104 |
"markdown, no translations, no romanisation, no explanations. Reply in "
|
| 105 |
"1–3 short sentences suitable to be read aloud by a text-to-speech voice.",
|
|
|
|
| 120 |
if anchors:
|
| 121 |
lines += [
|
| 122 |
"",
|
| 123 |
+
f"Reference phrases in {full} — these pairs are STYLE/ORTHOGRAPHY "
|
| 124 |
+
"examples ONLY (showing how English/French maps to the correct "
|
| 125 |
+
"dialect). Do NOT treat them as a translation task: when the user "
|
| 126 |
+
"writes one of these source phrases, do not just output its target "
|
| 127 |
+
"verbatim — instead REPLY conversationally in the same dialectal "
|
| 128 |
+
"style:",
|
| 129 |
]
|
| 130 |
for item in anchors:
|
| 131 |
src = item.get("source", "").strip()
|
|
|
|
| 137 |
lines += [
|
| 138 |
"",
|
| 139 |
"Additional reference phrases relevant to the current user input "
|
| 140 |
+
f"(curated gold {full} translations — STYLE references only, not a "
|
| 141 |
+
"translation task; reply conversationally, do not echo the target "
|
| 142 |
+
"verbatim):",
|
| 143 |
]
|
| 144 |
for item in extra_examples:
|
| 145 |
src = (item.get("source") or "").strip()
|