TeodoraR commited on
Commit
dca69ed
·
verified ·
1 Parent(s): 1adc5c6

create readme file

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ro
5
+ pipeline_tag: text-to-speech
6
+ ---
7
+
8
+ # Romanian TTS Model (Finetuned)
9
+
10
+ This is a **FastPitch** model finetuned for the Romanian language. It was trained **(from scratch)** on the SWARA dataset and finetuned on specific speaker samples (BEA/SGS).
11
+
12
+ ## Model Details
13
+ - **Architecture:** FastPitch
14
+ - **Language:** Romanian (ro)
15
+ - **Base Dataset:** The SWARA Speech Corpus (18k samples)
16
+ - **Base Model:** trained on 16 speakers (includes both male & female voices, balanced data). The base model components can be found in the 'swara' directory.
17
+ - **Finetuning:** finetuned on 2 speakers (bas and sgs). Their checkpoints can be found in the 'bas' and 'sgs' directories.
18
+ - **Sample rate:** 22050Hz
19
+
20
+ ## Usage instructions
21
+ - **Included in the official repository of VITS:** https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechSynthesis/FastPitch
22
+ - **Our repository on finetuning various TTS models for the Romanian language:** https://gitlab.com/opentts_ragman/OpenTTS
23
+
24
+ ## Citation
25
+
26
+ If you use this model, please cite the original FastPitch paper and the SWARA dataset:
27
+
28
+ ```bibtex
29
+ @INPROCEEDINGS{fastpitch,
30
+ author={Łańcucki, Adrian},
31
+ booktitle={Proc. of ICASSP},
32
+ title={{Fastpitch: Parallel Text-to-Speech with Pitch Prediction}},
33
+ year={2021},
34
+ volume={},
35
+ number={},
36
+ pages={6588-6592},
37
+ keywords={Frequency synthesizers;Frequency modulation;Conferences;Semantics;Predictive models;Real-time systems;Acoustics;text-to-speech;speech synthesis;fundamental frequency},
38
+ doi={10.1109/ICASSP39728.2021.9413889}}
39
+
40
+ @inproceedings{stan_sped2017,
41
+ author = {Stan, Adriana and Dinescu, Florina and Tiple, Cristina and Meza, Serban and Orza, Bogdan and Chirila, Magdalena and Giurgiu, Mircea},
42
+ title = {{The SWARA Speech Corpus: A Large Parallel Romanian Read Speech Dataset}},
43
+ year = 2017,
44
+ address = {Bucharest, Romania},
45
+ booktitle = {{Proceedings of the 9th Conference on Speech Technology and Human-Computer Dialogue (SpeD)}},
46
+ month = {July, 6-9},
47
+ }
48
+ ```
49
+
50
+ If you use this specific finetuned checkpoint in your work, please cite it as follows:
51
+
52
+ ```bibtex
53
+ @ARTICLE{11269795,
54
+ author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
55
+ journal={IEEE Access},
56
+ title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
57
+ year={2025},
58
+ volume={13},
59
+ number={},
60
+ pages={203415-203428},
61
+ keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
62
+ doi={10.1109/ACCESS.2025.3637322}}
63
+
64
+ ```