File size: 2,183 Bytes
350820b
 
 
 
 
 
 
 
 
 
 
 
 
79eb6bf
350820b
 
 
 
 
 
 
 
 
f3e36f2
 
350820b
f3e36f2
350820b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: apache-2.0
pipeline_tag: audio-to-audio
---

# SCNet: Enhancing GAN-based Speech Generation with Subband Condition Network and Magnitude-aware Phase Loss

Recent speech generation has been predominantly driven by GAN-based networks aimed at high-quality waveform synthesis from mel-spectrograms. However, these methods often operate as black-box models, leading to the loss of inherent spectral information. In this work, we propose SCNet, a GAN-based vocoder augmented with a Subband Condition Network to address this issue. Specifically, SCNet leverages a subband signal predicted by a lightweight condition network as prior knowledge. This subband signal is then transformed via STFT to obtain Fourier coefficients, which are integrated into the backbone for the enhanced reconstruction. Additionally, to mitigate the phase wrapping, we introduce a magnitude-aware phase loss that computes instantaneous phase errors weighted by the corresponding magnitude, emphasizing regions with higher energy. Experimental results demonstrate that SCNet achieves superior performance in both objective and subjective evaluations for high-quality speech generation.

## Pre-requisites
1. Python >= 3.10
2. Clone this repository:
```bash
git clone https://github.com/vspeech/SCNet.git
cd SCNet
```
3. Install python requirements: 
```bash
pip install -r requirements.txt
```

## Pre-Trained Models
You can download the pre-trained LibriTTS model [here](https://drive.google.com/drive/folders/1Dn8f2PUodjME_SsfkJ8SGEtusXJNvkZI?usp=sharing) and copy to cp\_scnet directory.

Or download from huggingface:
```bash
huggingface-cli download vspeech/SCNet g_02005000 config.json --local-dir cp_scnet
```

## Inference
Please refer to the inference.py for details.
```bash
python inference.py 
--input_wavs_dir /path/to/your/input_wav \
--checkpoint_file /path/to/your/cp_scnet/model \
--output_dir /path/to/your/output_wav
```

## References
- [rishikksh20/iSTFTNet-pytorch](https://github.com/rishikksh20/iSTFTNet-pytorch)
- [yl4579/HiFTNet](https://github.com/yl4579/HiFTNet)
- [gemelo-ai/vocos](https://github.com/gemelo-ai/vocos)
- [NVIDIA/BigVGAN](https://github.com/NVIDIA/BigVGAN)