Instructions to use MoYoYoTech/VoiceDialogue with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MoYoYoTech/VoiceDialogue with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-to-speech", model="MoYoYoTech/VoiceDialogue")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("MoYoYoTech/VoiceDialogue", dtype="auto")

llama-cpp-python

How to use MoYoYoTech/VoiceDialogue with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="MoYoYoTech/VoiceDialogue",
	filename="assets/models/llm/qwen/Qwen3-8B-Q6_K.gguf",
)

llm.create_chat_completion(
	messages = "\"The answer to the universe is 42\""
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use MoYoYoTech/VoiceDialogue with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
llama cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MoYoYoTech/VoiceDialogue:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MoYoYoTech/VoiceDialogue:Q6_K

Use Docker

docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K

LM Studio
Jan
Ollama
How to use MoYoYoTech/VoiceDialogue with Ollama:
```
ollama run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Unsloth Studio

How to use MoYoYoTech/VoiceDialogue with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for MoYoYoTech/VoiceDialogue to start chatting

How to use MoYoYoTech/VoiceDialogue with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "MoYoYoTech/VoiceDialogue:Q6_K"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use MoYoYoTech/VoiceDialogue with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf MoYoYoTech/VoiceDialogue:Q6_K

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default MoYoYoTech/VoiceDialogue:Q6_K

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use MoYoYoTech/VoiceDialogue with Docker Model Runner:
```
docker model run hf.co/MoYoYoTech/VoiceDialogue:Q6_K
```

Lemonade

How to use MoYoYoTech/VoiceDialogue with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull MoYoYoTech/VoiceDialogue:Q6_K

Run and chat with the model

lemonade run user.VoiceDialogue-Q6_K

List all available models

lemonade list

hzeng412 Claude Fable 5 commited on 11 days ago

Commit

2e4bd5d

1 Parent(s): 21dbad4

Bundle Qwen3-ASR weights in packaged app; slim PyInstaller assets

Browse files

- qwen.py resolves model from env var > bundled assets dir > HF download
- PyInstaller hook: include assets/models/asr/qwen3-asr-1.7b, exclude
legacy funasr/whisper models and superseded TTS .bin weights
(weights are not committed; copy to assets/models/asr/qwen3-asr-1.7b
before packaging)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Files changed (2) hide show

build/pyinstaller/hooks/hook-voice_dialogue.py +24 -1
src/voice_dialogue/asr/models/qwen.py +14 -2

build/pyinstaller/hooks/hook-voice_dialogue.py CHANGED Viewed

@@ -24,8 +24,29 @@ ASSETS_ROOT = PROJECT_ROOT / "assets"
 # 收集主模块的所有子模块
 hiddenimports = collect_submodules('voice_dialogue')
 datas = collect_data_files('moyoyo_tts', include_py_files=True)
 # 收集系统资源文件
-datas += collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
 # ============================================================================
 # 第三方依赖配置
@@ -39,6 +60,7 @@ ML_DEPENDENCIES = [
     "pytorch_lightning",
     "huggingface_hub",
     "einops",
 ]
 # 语音处理相关依赖
@@ -117,6 +139,7 @@ DATA_PACKAGES = [
     ("spacy", {"include_py_files": True}),
     ("misaki", {}),
     ("silero_vad", {}),
 ]
 # 收集数据文件

 # 收集主模块的所有子模块
 hiddenimports = collect_submodules('voice_dialogue')
 datas = collect_data_files('moyoyo_tts', include_py_files=True)
+# 不打包的资源：
+# - 旧版 FunASR/Whisper 模型（默认引擎为内置的 Qwen3-ASR）
+# - TTS 预训练权重的 .bin（已内置等价的 model.safetensors）
+EXCLUDED_ASSET_PATTERNS = [
+    "assets/models/asr/funasr/",
+    "assets/models/asr/whisper/",
+    "chinese-roberta-wwm-ext-large/pytorch_model.bin",
+    "chinese-hubert-base/pytorch_model.bin",
+]
+def _is_excluded(source_path: str) -> bool:
+    normalized = source_path.replace("\\", "/")
+    return any(pattern in normalized for pattern in EXCLUDED_ASSET_PATTERNS)
 # 收集系统资源文件
+datas += [
+    (source, dest)
+    for source, dest in collect_system_data_files(ASSETS_ROOT.as_posix(), "assets")
+    if not _is_excluded(source)
+]
 # ============================================================================
 # 第三方依赖配置
     "pytorch_lightning",
     "huggingface_hub",
     "einops",
+    "qwen_asr",
 ]
 # 语音处理相关依赖
     ("spacy", {"include_py_files": True}),
     ("misaki", {}),
     ("silero_vad", {}),
+    ("qwen_asr", {}),
 ]
 # 收集数据文件

src/voice_dialogue/asr/models/qwen.py CHANGED Viewed

@@ -8,13 +8,25 @@ from qwen_asr import Qwen3ASRModel
 from voice_dialogue.asr.manager import asr_tables
 from voice_dialogue.asr.models.base import ASRInterface
 from voice_dialogue.asr.utils import ensure_minimum_audio_duration
 from voice_dialogue.utils.logger import logger
-DEFAULT_MODEL = os.environ.get('QWEN_ASR_MODEL', 'Qwen/Qwen3-ASR-1.7B')
 TARGET_SAMPLE_RATE = 16000
 @asr_tables.register('asr_classes', 'qwen')
 class QwenASRClient(ASRInterface):
     """Qwen3-ASR 客户端（transformers 后端，macOS 上使用 MPS 加速）"""
@@ -25,7 +37,7 @@ class QwenASRClient(ASRInterface):
         self.model: typing.Optional[Qwen3ASRModel] = None
     def setup(self, **kwargs) -> None:
-        model_name = kwargs.get('model', DEFAULT_MODEL)
         if torch.backends.mps.is_available():
             device_map, dtype = 'mps', torch.bfloat16

 from voice_dialogue.asr.manager import asr_tables
 from voice_dialogue.asr.models.base import ASRInterface
 from voice_dialogue.asr.utils import ensure_minimum_audio_duration
+from voice_dialogue.config import paths
 from voice_dialogue.utils.logger import logger
+# 内置模型目录（打包分发时随应用携带，存在则离线加载）
+BUILTIN_QWEN_ASR_MODEL_PATH = paths.ASR_MODELS_PATH / 'qwen3-asr-1.7b'
 TARGET_SAMPLE_RATE = 16000
+def resolve_model_path() -> str:
+    """模型来源优先级：环境变量 > 内置目录 > HuggingFace 自动下载。"""
+    env_model = os.environ.get('QWEN_ASR_MODEL')
+    if env_model:
+        return env_model
+    if (BUILTIN_QWEN_ASR_MODEL_PATH / 'config.json').exists():
+        return BUILTIN_QWEN_ASR_MODEL_PATH.as_posix()
+    return 'Qwen/Qwen3-ASR-1.7B'
 @asr_tables.register('asr_classes', 'qwen')
 class QwenASRClient(ASRInterface):
     """Qwen3-ASR 客户端（transformers 后端，macOS 上使用 MPS 加速）"""
         self.model: typing.Optional[Qwen3ASRModel] = None
     def setup(self, **kwargs) -> None:
+        model_name = kwargs.get('model') or resolve_model_path()
         if torch.backends.mps.is_available():
             device_map, dtype = 'mps', torch.bfloat16