VITS2 - Claude (Luxembourgish Gender-Neutral Voice)
A VITS2-based text-to-speech model for Luxembourgish, featuring a synthetic gender-neutral voice.
Model Description
This model was trained using the VITS2 architecture on Luxembourgish speech data from the Lëtzebuerger Online Dictionnaire (LOD) example sentences.
"Claude" is a synthetic gender-neutral Luxembourgish voice created by modulating the original LOD recordings.
Model Details
- Architecture: VITS2 with duration discriminator and transformer flows
- Language: Luxembourgish (lb)
- Speaker: Single speaker (gender-neutral, synthetic)
- Sample Rate: 24000 Hz
- Checkpoint: G_57000 (57,000 steps)
- License: MIT
Usage
Note: Text should be lowercased before synthesis. Additional text normalization may be required.
This model requires the included Python source files for inference.
Basic Usage
import torch
import scipy.io.wavfile as wavfile
from vits2_engine import VITS2Engine
# Load the model
engine = VITS2Engine(model_dir="path/to/vits2-claude")
# Generate speech
wav = engine.tts("moien, wéi geet et dir?")
# Save to file
wavfile.write("output.wav", engine.sample_rate, wav)
Command Line
python inference.py "moien, wéi geet et dir?"
# With custom parameters
python inference.py "Text" --noise_scale 0.5 --length_scale 1.1 -o output.wav
Parameters
noise_scale: Controls voice variation (default: 0.667, lower = more consistent)noise_scale_w: Controls duration variation (default: 0.8)length_scale: Controls speech speed (default: 1.0, higher = slower)
Technical Specifications
| Parameter | Value |
|---|---|
| Hidden Channels | 192 |
| Filter Channels | 768 |
| Attention Heads | 2 |
| Encoder Layers | 6 |
| Mel Channels | 80 |
| FFT Size | 1024 |
| Hop Length | 256 |
Requirements
- Python 3.8+
- PyTorch
- scipy
- numpy
- Cython (for monotonic_align)
Citation
If you use this model, please cite:
@misc{zls2025vits2claude,
title={VITS2 Claude - Luxembourgish Gender-Neutral Voice},
author={Zenter fir d'Lëtzebuerger Sprooch},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/ZLSCompLing/VITS2-Claude}
}
Acknowledgments
Developed by Zenter fir d'Lëtzebuerger Sprooch.
Voice data sourced from the Lëtzebuerger Online Dictionnaire (LOD). The original audio files are available via the LOD linguistic data on data.public.lu, which provides an XML file containing example sentence IDs. Audio files can be accessed at:
https://lod.lu/uploads/examples/AAC/{folder}/{id}.m4a
where {folder} is the first 2 characters of {id}.
This model is used in Sproochmaschinn, a Luxembourgish speech processing platform.
- Downloads last month
- -