mudler commited on
Commit
d8c03d9
·
verified ·
1 Parent(s): e74030b

Add files using upload-large-folder tool

Browse files
.gitattributes CHANGED
@@ -33,3 +33,8 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.gguf filter=lfs diff=lfs merge=lfs -text
37
+ voice-en-Emma.gguf filter=lfs diff=lfs merge=lfs -text
38
+ voice-en-Carter_man.gguf filter=lfs diff=lfs merge=lfs -text
39
+ vibevoice-realtime-0.5B-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
40
+ vibevoice-asr-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: vibevoice.cpp
4
+ tags:
5
+ - tts
6
+ - asr
7
+ - speech
8
+ - vibevoice
9
+ - gguf
10
+ - ggml
11
+ base_model:
12
+ - microsoft/VibeVoice-Realtime-0.5B
13
+ - microsoft/VibeVoice-ASR
14
+ ---
15
+
16
+ # vibevoice.cpp — quantized model bundle
17
+
18
+ Quantized GGUF weights for [vibevoice.cpp](https://github.com/mudler/vibevoice.cpp),
19
+ a C/C++ port of Microsoft VibeVoice (TTS + ASR) on top of `ggml`.
20
+
21
+ | File | Source | Quant | Size |
22
+ | ---- | ------ | ----- | ---- |
23
+ | `vibevoice-realtime-0.5B-q8_0.gguf` | `microsoft/VibeVoice-Realtime-0.5B` | Q8_0 (matmul) + F16 | ~1.6 GB |
24
+ | `vibevoice-asr-q8_0.gguf` | `microsoft/VibeVoice-ASR` | Q8_0 (matmul) + F16 | ~13 GB |
25
+ | `voice-en-Carter_man.gguf` | upstream voice prompt cache | F16 | 8 MB |
26
+ | `voice-en-Emma.gguf` | upstream voice prompt cache | F16 | 6 MB |
27
+ | `tokenizer.gguf` | Qwen2.5 BPE + VibeVoice specials | — | 6 MB |
28
+
29
+ ## Quantization scheme
30
+
31
+ `scripts/quantize_gguf.py` in the source repo selectively quantizes only the
32
+ LM matmul weights — attention q/k/v/o, ffn gate/up/down, and lm_head — to
33
+ Q8_0. Everything else (1-D conv kernels, RMSNorm scales, biases,
34
+ layer-scale gammas, token embeddings, small scalars) passes through
35
+ unchanged. The conv1d implementation in vibevoice.cpp casts kernels to F16
36
+ inline rather than dequantizing on the fly, so quantizing those would
37
+ corrupt the convolution outputs.
38
+
39
+ Q8_0 was chosen because it's pure-Python implementable in `gguf-py` and
40
+ gives a ~60% size reduction on the 7B ASR model with no measurable
41
+ quality regression in the closed-loop TTS → ASR roundtrip test.
42
+
43
+ ## Quickstart
44
+
45
+ ```bash
46
+ git clone --recursive https://github.com/mudler/vibevoice.cpp
47
+ cd vibevoice.cpp && cmake -B build -DVIBEVOICE_BUILD_TESTS=ON && cmake --build build -j
48
+
49
+ # Pull this bundle
50
+ mkdir -p models && cd models
51
+ hf download mudler/vibevoice.cpp-models --local-dir .
52
+ cd ..
53
+
54
+ # TTS
55
+ build/bin/vibevoice-cli tts \
56
+ --model models/vibevoice-realtime-0.5B-q8_0.gguf \
57
+ --voice models/voice-en-Carter_man.gguf \
58
+ --tokenizer models/tokenizer.gguf \
59
+ --text "Hello world this is a test of the synthesis system." \
60
+ --out hello.wav
61
+
62
+ # ASR
63
+ build/bin/vibevoice-cli asr \
64
+ --model models/vibevoice-asr-q8_0.gguf \
65
+ --tokenizer models/tokenizer.gguf \
66
+ --audio hello.wav
67
+ # -> [{"Start":0,"End":2.8,"Speaker":0,"Content":"Hello world, this is a test of the synthesis system."}]
68
+ ```
69
+
70
+ ## Closed-loop verification
71
+
72
+ The `test_closed_loop` ctest in vibevoice.cpp runs TTS → ASR end-to-end
73
+ and asserts ≥80% source-word recall in the recovered transcript. With
74
+ this bundle (both Q8_0 models) it passes at 10/10 (100 %).
75
+
76
+ ## License
77
+
78
+ Weights are derived from Microsoft VibeVoice
79
+ ([VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B)
80
+ and [VibeVoice-ASR](https://huggingface.co/microsoft/VibeVoice-ASR));
81
+ follow the upstream model licenses for use. The conversion + quantization
82
+ tooling is released under MIT as part of vibevoice.cpp.
tokenizer.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37dc3b722d5677e37e29a57df55aa05c485116eeb5459e57ff8dde616b4986f6
3
+ size 5922368
vibevoice-asr-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39ddded77a094a1fad9031fbaaee04943d7906d314d51161976bf393cca343d6
3
+ size 13927206208
vibevoice-realtime-0.5B-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5251e3f0386d1056a90c61b6c7359a4775da44dd19402499bef1989c4b5c653a
3
+ size 1699832128
voice-en-Carter_man.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b15cd8b9cae6ee2c3d20b0ee6e7bfe93f13489f8b63b6834e9bbf0dfabf6505a
3
+ size 8472448
voice-en-Emma.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c96a15786835d73d0e3e7e37af668de6f93392e04de0ada33512ff83f6cc4ba
3
+ size 6647168