| | --- |
| | license: mit |
| | library_name: mlx |
| | tags: |
| | - mlx |
| | - audio |
| | - speech-enhancement |
| | - noise-suppression |
| | - deepfilternet |
| | - apple-silicon |
| | base_model: DeepFilterNet/DeepFilterNet2 |
| | pipeline_tag: audio-to-audio |
| | --- |
| | |
| | # DeepFilterNet2 — MLX |
| |
|
| | MLX-compatible weights for [DeepFilterNet2](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement model that suppresses background noise from audio. |
| |
|
| | This is a direct conversion of the original PyTorch weights to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon. |
| |
|
| | ## Origin |
| |
|
| | - **Original model:** [DeepFilterNet2](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schroeter |
| | - **Paper:** [DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio](https://arxiv.org/abs/2205.05474) |
| | - **License:** MIT (same as the original) |
| | - **Conversion:** PyTorch -> `safetensors` via the included `convert_deepfilternet.py` script |
| |
|
| | No fine-tuning or quantization was applied. Weights are converted directly from the original checkpoint. |
| |
|
| | ## Files |
| |
|
| | | File | Description | |
| | |---|---| |
| | | `config.json` | Model architecture configuration | |
| | | `model.safetensors` | Pre-converted weights (~8.9 MB, float32) | |
| | | `convert_deepfilternet.py` | Conversion script (PyTorch -> MLX safetensors) | |
| |
|
| | ## Model Details |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Sample rate | 48 kHz | |
| | | FFT size | 960 | |
| | | Hop size | 480 | |
| | | ERB bands | 32 | |
| | | DF bins | 96 | |
| | | DF order | 5 | |
| | | Embedding hidden dim | 256 | |
| |
|
| | ## Usage |
| |
|
| | ### Swift (mlx-audio-swift) |
| |
|
| | ```swift |
| | import MLXAudioSTS |
| | |
| | let model = try await DeepFilterNetModel.fromPretrained("iky1e/DeepFilterNet2-MLX") |
| | let enhanced = try model.enhance(audioArray) |
| | ``` |
| |
|
| | ### Python (mlx-audio) |
| |
|
| | ```python |
| | from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel |
| | |
| | model = DeepFilterNetModel.from_pretrained("iky1e/DeepFilterNet2-MLX") |
| | enhanced = model.enhance("noisy.wav") |
| | ``` |
| |
|
| | ## Converting from PyTorch |
| |
|
| | ```bash |
| | python convert_deepfilternet.py \ |
| | --input /path/to/DeepFilterNet2 \ |
| | --output ./DeepFilterNet2-MLX \ |
| | --name DeepFilterNet2 |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @inproceedings{schroeter2022deepfilternet2, |
| | title = {{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio}, |
| | author = {Schr{\"o}ter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas}, |
| | booktitle={17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022)}, |
| | year = {2022}, |
| | } |
| | ``` |
| |
|