Local S2S Shell Starter

A simple local speech-to-speech assistant that runs from a Windows terminal.

Stack

  • STT: faster-whisper medium
  • LLM: Qwen2.5 3B Instruct GGUF Q4_K_M
  • TTS: Windows SAPI voice
  • UI: terminal only

Pipeline

microphone -> faster-whisper -> Qwen2.5 3B GGUF -> Windows SAPI speech

Hardware Target

  • CPU fallback supported
  • NVIDIA GPU auto-used when available
  • 8GB+ VRAM recommended for smoother local use

Setup

Run from PowerShell:

py -3.11 -m venv .venv ..venv\Scripts\python.exe -m pip install --upgrade pip setuptools wheel ..venv\Scripts\python.exe -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 ..venv\Scripts\python.exe -m pip install -r requirements.txt ..venv\Scripts\python.exe download_models.py

Run

.\run_shell_s2s.bat

Shell Commands

Enter = record mic and run speech-to-speech t = type text and hear reply d = list audio devices q = quit

Model Download

The downloader fetches:

Repo: bartowski/Qwen2.5-3B-Instruct-GGUF File: Qwen2.5-3B-Instruct-Q4_K_M.gguf

The GGUF model file is not committed to this repository.

Scope

This is a local voice-chat starter. It does not control the computer, run tools, or perform system automation.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support