create readme file
Browse files
README.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ro
|
| 5 |
+
pipeline_tag: text-to-speech
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Romanian TTS Model (Finetuned)
|
| 9 |
+
|
| 10 |
+
This is a **FastPitch** model finetuned for the Romanian language. It was trained **(from scratch)** on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS).
|
| 11 |
+
|
| 12 |
+
## Model Details
|
| 13 |
+
- **Architecture:** FastPitch
|
| 14 |
+
- **Language:** Romanian (ro)
|
| 15 |
+
- **Base Dataset:** The SWARA Speech Corpus (18k samples)
|
| 16 |
+
- **Base Model:** trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory.
|
| 17 |
+
- **Finetuning:** finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories.
|
| 18 |
+
- **Sample rate:** 22050Hz
|
| 19 |
+
|
| 20 |
+
## Usage instructions
|
| 21 |
+
- **Included in the official repository of VITS:** https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch
|
| 22 |
+
- **Our repository on finetuning various TTS models for the Romanian language:** https://gitlab.com/opentts_ragman/OpenTTS
|
| 23 |
+
|
| 24 |
+
## Citation
|
| 25 |
+
|
| 26 |
+
If you use this model, please cite the original FastPitch paper and the SWARA dataset:
|
| 27 |
+
|
| 28 |
+
```bibtex
|
| 29 |
+
@INPROCEEDINGS{fastpitch,
|
| 30 |
+
author={Łańcucki, Adrian},
|
| 31 |
+
booktitle={Proc. of ICASSP},
|
| 32 |
+
title={{Fastpitch: Parallel Text-to-Speech with Pitch Prediction}},
|
| 33 |
+
year={2021},
|
| 34 |
+
volume={},
|
| 35 |
+
number={},
|
| 36 |
+
pages={6588-6592},
|
| 37 |
+
keywords={Frequency synthesizers;Frequency modulation;Conferences;Semantics;Predictive models;Real-time systems;Acoustics;text-to-speech;speech synthesis;fundamental frequency},
|
| 38 |
+
doi={10.1109/ICASSP39728.2021.9413889}}
|
| 39 |
+
|
| 40 |
+
@inproceedings{stan_sped2017,
|
| 41 |
+
author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea},
|
| 42 |
+
title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}},
|
| 43 |
+
year = 2017,
|
| 44 |
+
address = {Bucharest, Romania},
|
| 45 |
+
booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}},
|
| 46 |
+
month = {July, 6-9},
|
| 47 |
+
}
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
If you use this specific finetuned checkpoint in your work, please cite it as follows:
|
| 51 |
+
|
| 52 |
+
```bibtex
|
| 53 |
+
@ARTICLE{11269795,
|
| 54 |
+
author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
|
| 55 |
+
journal={IEEE Access},
|
| 56 |
+
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
|
| 57 |
+
year={2025},
|
| 58 |
+
volume={13},
|
| 59 |
+
number={},
|
| 60 |
+
pages={203415-203428},
|
| 61 |
+
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
|
| 62 |
+
doi={10.1109/ACCESS.2025.3637322}}
|
| 63 |
+
|
| 64 |
+
```
|