Mirror nv-modelcard++/explainability.md from nvidia/bigvgan_v2_44khz_128band_512x@95a9d1dc
Browse files
encoders/nvidia/bigvgan_v2_44khz_128band_512x/nv-modelcard++/explainability.md
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
| Field | Response |
|
| 2 |
+
| :---------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
| 3 |
+
| Intended Application & Domain: | Generating waveform from mel spectrogram. |
|
| 4 |
+
| Model Type: | Convolutional Neural Network (CNN) |
|
| 5 |
+
| Intended Users: | This model is intended for developers to synthesize and generate waveforms from the AI-generated mel spectrograms. |
|
| 6 |
+
| Output: | Audio Waveform |
|
| 7 |
+
| Describe how the model works: | Model generates audio waveform corresponding to the input mel spectrogram. |
|
| 8 |
+
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
|
| 9 |
+
| Technical Limitations: | This may not perform well on synthetically-generated mel spectrograms that deviate significantly from the profile of mel spectrograms on which this was trained. |
|
| 10 |
+
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
| 11 |
+
| Performance Metrics: | Perceptual Evaluation of Speech Quality (PESQ), Virtual Speech Quality Objective Listener (VISQOL), Multi-resolution STFT (MRSTFT), Mel cepstral distortion (MCD), Periodicity RMSE, Voice/Unvoiced F1 Score (V/UV F1) |
|
| 12 |
+
| Potential Known Risks: | This model may generate low-quality or distorted soundwaves. |
|
| 13 |
+
| Licensing: | https://github.com/NVIDIA/BigVGAN/blob/main/LICENSE |
|