Sync model card with upstream GitHub inference README
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ license: apache-2.0
|
|
| 15 |
acoustic echo cancellation (AEC), noise suppression, and dereverberation of
|
| 16 |
16 kHz speech, designed to run on commodity CPUs in real time.
|
| 17 |
|
| 18 |
-
-
|
| 19 |
- ~1.66 ms per 16 ms frame on Zen4 (24 threads) — **≈9.6× realtime**
|
| 20 |
- Causal, streaming: 256-sample hop, 16 ms algorithmic latency
|
| 21 |
- F32 reference inference in C++ via [GGML](https://github.com/ggml-org/ggml);
|
|
@@ -90,8 +90,8 @@ that implementation to this one:
|
|
| 90 |
|
| 91 |
| | DeepVQE (our re-implementation) | LocalVQE |
|
| 92 |
|---|---|---|
|
| 93 |
-
| Parameters | ~7.5 M |
|
| 94 |
-
| Weights (F32) | ~30 MB | ~
|
| 95 |
| Analysis | STFT (complex FFT) | DCT-II (real, in-graph) |
|
| 96 |
| Bottleneck | GRU | S4D (diagonal state space) |
|
| 97 |
| CCM arithmetic | Complex | Real-valued (GGML-friendly) |
|
|
@@ -105,8 +105,8 @@ parameter count vs GRU at similar quality.
|
|
| 105 |
|
| 106 |
| File | Size | Description |
|
| 107 |
|---|---|---|
|
| 108 |
-
| `localvqe-v1.pt` | 11 MB | PyTorch checkpoint — DNS5 pre-training + ICASSP 2022/2023 AEC Challenge fine-tune. |
|
| 109 |
-
| `localvqe-v1-f32.gguf` | 5 MB | GGML F32 export (BN-folded, DCT weights embedded). This is what the C++ inference engine loads. |
|
| 110 |
|
| 111 |
Only F32 GGUF is published today. A `quantize` tool is included in the C++
|
| 112 |
build (see below) and the architecture is designed to be Q4_K / Q8_0
|
|
@@ -173,7 +173,7 @@ omit them rather than publish misleading figures.
|
|
| 173 |
| Decoder | 5 sub-pixel conv + BN blocks, mirroring encoder |
|
| 174 |
| CCM | 27-ch → 3×3 complex convolving mask (real-valued arithmetic) |
|
| 175 |
| Kernel | (4, 4) time × freq, causal padding |
|
| 176 |
-
| Parameters |
|
| 177 |
|
| 178 |
## Building the C++ Inference Engine
|
| 179 |
|
|
@@ -237,14 +237,14 @@ for the queue, the "quiet" column is what you'll see.
|
|
| 237 |
|
| 238 |
## Running Inference
|
| 239 |
|
| 240 |
-
Download `localvqe-v1-f32.gguf` from this repository (the file list above)
|
| 241 |
either via `huggingface-cli`, the Hub web UI, or `hf_hub_download` from
|
| 242 |
`huggingface_hub`. Then:
|
| 243 |
|
| 244 |
### CLI
|
| 245 |
|
| 246 |
```bash
|
| 247 |
-
./ggml/build/bin/localvqe localvqe-v1-f32.gguf \
|
| 248 |
--in-wav mic.wav ref.wav \
|
| 249 |
--out-wav enhanced.wav
|
| 250 |
```
|
|
@@ -254,7 +254,7 @@ Expects 16 kHz mono PCM for both mic and far-end reference.
|
|
| 254 |
### Benchmark
|
| 255 |
|
| 256 |
```bash
|
| 257 |
-
./ggml/build/bin/bench localvqe-v1-f32.gguf \
|
| 258 |
--in-wav mic.wav ref.wav --iters 10 --profile
|
| 259 |
```
|
| 260 |
|
|
@@ -278,7 +278,7 @@ in the C++ build can produce GGUF variants from the F32 reference for
|
|
| 278 |
experimentation:
|
| 279 |
|
| 280 |
```bash
|
| 281 |
-
./ggml/build/bin/quantize localvqe-v1-f32.gguf localvqe-v1-q8.gguf Q8_0
|
| 282 |
```
|
| 283 |
|
| 284 |
Expect end-to-end quality loss until proper per-tensor selection and
|
|
@@ -286,7 +286,7 @@ calibration have been worked through.
|
|
| 286 |
|
| 287 |
## PyTorch Reference
|
| 288 |
|
| 289 |
-
`localvqe-v1.pt` is the PyTorch checkpoint used to produce the GGUF export.
|
| 290 |
It is provided for verification, ablation, and downstream research — not
|
| 291 |
for end-user inference, which should go through the GGML build above. The
|
| 292 |
model definition lives under `pytorch/` in the
|
|
|
|
| 15 |
acoustic echo cancellation (AEC), noise suppression, and dereverberation of
|
| 16 |
16 kHz speech, designed to run on commodity CPUs in real time.
|
| 17 |
|
| 18 |
+
- 1.3 M parameters (~5 MB F32)
|
| 19 |
- ~1.66 ms per 16 ms frame on Zen4 (24 threads) — **≈9.6× realtime**
|
| 20 |
- Causal, streaming: 256-sample hop, 16 ms algorithmic latency
|
| 21 |
- F32 reference inference in C++ via [GGML](https://github.com/ggml-org/ggml);
|
|
|
|
| 90 |
|
| 91 |
| | DeepVQE (our re-implementation) | LocalVQE |
|
| 92 |
|---|---|---|
|
| 93 |
+
| Parameters | ~7.5 M | 1.3 M |
|
| 94 |
+
| Weights (F32) | ~30 MB | ~5 MB |
|
| 95 |
| Analysis | STFT (complex FFT) | DCT-II (real, in-graph) |
|
| 96 |
| Bottleneck | GRU | S4D (diagonal state space) |
|
| 97 |
| CCM arithmetic | Complex | Real-valued (GGML-friendly) |
|
|
|
|
| 105 |
|
| 106 |
| File | Size | Description |
|
| 107 |
|---|---|---|
|
| 108 |
+
| `localvqe-v1-1.3M.pt` | 11 MB | PyTorch checkpoint — DNS5 pre-training + ICASSP 2022/2023 AEC Challenge fine-tune. |
|
| 109 |
+
| `localvqe-v1-1.3M-f32.gguf` | 5 MB | GGML F32 export (BN-folded, DCT weights embedded). This is what the C++ inference engine loads. |
|
| 110 |
|
| 111 |
Only F32 GGUF is published today. A `quantize` tool is included in the C++
|
| 112 |
build (see below) and the architecture is designed to be Q4_K / Q8_0
|
|
|
|
| 173 |
| Decoder | 5 sub-pixel conv + BN blocks, mirroring encoder |
|
| 174 |
| CCM | 27-ch → 3×3 complex convolving mask (real-valued arithmetic) |
|
| 175 |
| Kernel | (4, 4) time × freq, causal padding |
|
| 176 |
+
| Parameters | 1.3 M |
|
| 177 |
|
| 178 |
## Building the C++ Inference Engine
|
| 179 |
|
|
|
|
| 237 |
|
| 238 |
## Running Inference
|
| 239 |
|
| 240 |
+
Download `localvqe-v1-1.3M-f32.gguf` from this repository (the file list above)
|
| 241 |
either via `huggingface-cli`, the Hub web UI, or `hf_hub_download` from
|
| 242 |
`huggingface_hub`. Then:
|
| 243 |
|
| 244 |
### CLI
|
| 245 |
|
| 246 |
```bash
|
| 247 |
+
./ggml/build/bin/localvqe localvqe-v1-1.3M-f32.gguf \
|
| 248 |
--in-wav mic.wav ref.wav \
|
| 249 |
--out-wav enhanced.wav
|
| 250 |
```
|
|
|
|
| 254 |
### Benchmark
|
| 255 |
|
| 256 |
```bash
|
| 257 |
+
./ggml/build/bin/bench localvqe-v1-1.3M-f32.gguf \
|
| 258 |
--in-wav mic.wav ref.wav --iters 10 --profile
|
| 259 |
```
|
| 260 |
|
|
|
|
| 278 |
experimentation:
|
| 279 |
|
| 280 |
```bash
|
| 281 |
+
./ggml/build/bin/quantize localvqe-v1-1.3M-f32.gguf localvqe-v1-1.3M-q8.gguf Q8_0
|
| 282 |
```
|
| 283 |
|
| 284 |
Expect end-to-end quality loss until proper per-tensor selection and
|
|
|
|
| 286 |
|
| 287 |
## PyTorch Reference
|
| 288 |
|
| 289 |
+
`localvqe-v1-1.3M.pt` is the PyTorch checkpoint used to produce the GGUF export.
|
| 290 |
It is provided for verification, ablation, and downstream research — not
|
| 291 |
for end-user inference, which should go through the GGML build above. The
|
| 292 |
model definition lives under `pytorch/` in the
|