LocalAI-io
/

LocalVQE

@@ -15,7 +15,7 @@ license: apache-2.0
 acoustic echo cancellation (AEC), noise suppression, and dereverberation of
 16 kHz speech, designed to run on commodity CPUs in real time.
-- ~0.9 M parameters (~3.5 MB F32)
 - ~1.66 ms per 16 ms frame on Zen4 (24 threads) — **≈9.6× realtime**
 - Causal, streaming: 256-sample hop, 16 ms algorithmic latency
 - F32 reference inference in C++ via [GGML](https://github.com/ggml-org/ggml);
@@ -90,8 +90,8 @@ that implementation to this one:
 | | DeepVQE (our re-implementation) | LocalVQE |
 |---|---|---|
-| Parameters | ~7.5 M | ~0.9 M |
-| Weights (F32) | ~30 MB | ~3.5 MB |
 | Analysis | STFT (complex FFT) | DCT-II (real, in-graph) |
 | Bottleneck | GRU | S4D (diagonal state space) |
 | CCM arithmetic | Complex | Real-valued (GGML-friendly) |
@@ -105,8 +105,8 @@ parameter count vs GRU at similar quality.
 | File | Size | Description |
 |---|---|---|
-| `localvqe-v1.pt` | 11 MB | PyTorch checkpoint — DNS5 pre-training + ICASSP 2022/2023 AEC Challenge fine-tune. |
-| `localvqe-v1-f32.gguf` | 5 MB | GGML F32 export (BN-folded, DCT weights embedded). This is what the C++ inference engine loads. |
 Only F32 GGUF is published today. A `quantize` tool is included in the C++
 build (see below) and the architecture is designed to be Q4_K / Q8_0
@@ -173,7 +173,7 @@ omit them rather than publish misleading figures.
 | Decoder | 5 sub-pixel conv + BN blocks, mirroring encoder |
 | CCM | 27-ch → 3×3 complex convolving mask (real-valued arithmetic) |
 | Kernel | (4, 4) time × freq, causal padding |
-| Parameters | ~0.9 M |
 ## Building the C++ Inference Engine
@@ -237,14 +237,14 @@ for the queue, the "quiet" column is what you'll see.
 ## Running Inference
-Download `localvqe-v1-f32.gguf` from this repository (the file list above)
 either via `huggingface-cli`, the Hub web UI, or `hf_hub_download` from
 `huggingface_hub`. Then:
 ### CLI
 ```bash
-./ggml/build/bin/localvqe localvqe-v1-f32.gguf \
     --in-wav mic.wav ref.wav \
     --out-wav enhanced.wav
 ```
@@ -254,7 +254,7 @@ Expects 16 kHz mono PCM for both mic and far-end reference.
 ### Benchmark
 ```bash
-./ggml/build/bin/bench localvqe-v1-f32.gguf \
     --in-wav mic.wav ref.wav --iters 10 --profile
 ```
@@ -278,7 +278,7 @@ in the C++ build can produce GGUF variants from the F32 reference for
 experimentation:
 ```bash
-./ggml/build/bin/quantize localvqe-v1-f32.gguf localvqe-v1-q8.gguf Q8_0
 ```
 Expect end-to-end quality loss until proper per-tensor selection and
@@ -286,7 +286,7 @@ calibration have been worked through.
 ## PyTorch Reference
-`localvqe-v1.pt` is the PyTorch checkpoint used to produce the GGUF export.
 It is provided for verification, ablation, and downstream research — not
 for end-user inference, which should go through the GGML build above. The
 model definition lives under `pytorch/` in the

 acoustic echo cancellation (AEC), noise suppression, and dereverberation of
 16 kHz speech, designed to run on commodity CPUs in real time.
+- 1.3 M parameters (~5 MB F32)
 - ~1.66 ms per 16 ms frame on Zen4 (24 threads) — **≈9.6× realtime**
 - Causal, streaming: 256-sample hop, 16 ms algorithmic latency
 - F32 reference inference in C++ via [GGML](https://github.com/ggml-org/ggml);
 | | DeepVQE (our re-implementation) | LocalVQE |
 |---|---|---|
+| Parameters | ~7.5 M | 1.3 M |
+| Weights (F32) | ~30 MB | ~5 MB |
 | Analysis | STFT (complex FFT) | DCT-II (real, in-graph) |
 | Bottleneck | GRU | S4D (diagonal state space) |
 | CCM arithmetic | Complex | Real-valued (GGML-friendly) |
 | File | Size | Description |
 |---|---|---|
+| `localvqe-v1-1.3M.pt` | 11 MB | PyTorch checkpoint — DNS5 pre-training + ICASSP 2022/2023 AEC Challenge fine-tune. |
+| `localvqe-v1-1.3M-f32.gguf` | 5 MB | GGML F32 export (BN-folded, DCT weights embedded). This is what the C++ inference engine loads. |
 Only F32 GGUF is published today. A `quantize` tool is included in the C++
 build (see below) and the architecture is designed to be Q4_K / Q8_0
 | Decoder | 5 sub-pixel conv + BN blocks, mirroring encoder |
 | CCM | 27-ch → 3×3 complex convolving mask (real-valued arithmetic) |
 | Kernel | (4, 4) time × freq, causal padding |
+| Parameters | 1.3 M |
 ## Building the C++ Inference Engine
 ## Running Inference
+Download `localvqe-v1-1.3M-f32.gguf` from this repository (the file list above)
 either via `huggingface-cli`, the Hub web UI, or `hf_hub_download` from
 `huggingface_hub`. Then:
 ### CLI
 ```bash
+./ggml/build/bin/localvqe localvqe-v1-1.3M-f32.gguf \
     --in-wav mic.wav ref.wav \
     --out-wav enhanced.wav
 ```
 ### Benchmark
 ```bash
+./ggml/build/bin/bench localvqe-v1-1.3M-f32.gguf \
     --in-wav mic.wav ref.wav --iters 10 --profile
 ```
 experimentation:
 ```bash
+./ggml/build/bin/quantize localvqe-v1-1.3M-f32.gguf localvqe-v1-1.3M-q8.gguf Q8_0
 ```
 Expect end-to-end quality loss until proper per-tensor selection and
 ## PyTorch Reference
+`localvqe-v1-1.3M.pt` is the PyTorch checkpoint used to produce the GGUF export.
 It is provided for verification, ablation, and downstream research — not
 for end-user inference, which should go through the GGML build above. The
 model definition lives under `pytorch/` in the