Text-to-Speech
Transformers
Safetensors
qwen3
text-generation
speech
tts
voice
text-generation-inference
Instructions to use SPRINGLab/Indic-Mio with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SPRINGLab/Indic-Mio with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-speech", model="SPRINGLab/Indic-Mio")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SPRINGLab/Indic-Mio") model = AutoModelForCausalLM.from_pretrained("SPRINGLab/Indic-Mio") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - hi | |
| - bn | |
| - mr | |
| - te | |
| - kn | |
| - mai | |
| - as | |
| - brx | |
| - doi | |
| - gu | |
| - ml | |
| - pa | |
| - ta | |
| - ne | |
| - sa | |
| - sat | |
| - sd | |
| - or | |
| - mni | |
| - ks | |
| - kok | |
| - ur | |
| - en | |
| base_model: Aratako/MioTTS-0.6B | |
| library_name: transformers | |
| model_name: Indic-Mio | |
| pipeline_tag: text-to-speech | |
| tags: | |
| - speech | |
| - tts | |
| - voice | |
| datasets: | |
| - ai4bharat/Rasa | |
| - mythicinfinity/libritts_r | |
| - ylacombe/expresso | |
| widget: | |
| - text: >- | |
| प्लान तो बढ़िया है, but wait... Have you checked the hotel bookings? Last | |
| minute पे रूम मिलना is next to impossible on weekends. | |
| output: | |
| url: samples/sample1.wav | |
| - text: >- | |
| The rain hammered against the cold glass as Detective Morgan slammed the | |
| folder onto the table. 'I know you were there that night,' she said, her | |
| voice barely above a whisper. 'The question is — what did you see?' | |
| output: | |
| url: samples/sample2.wav | |
| - text: >- | |
| જ્યારે પણ મને તેની સખત જરૂર હોય ત્યારે આ દુકાનમાં મદદ કરવા માટે ક્યારેય કોઈ | |
| હાજર નથી હોતું. <disgust> | |
| output: | |
| url: samples/sample3.wav | |
| - text: இந்த கோயில்லயா உங்க கல்யாணம் நடந்துச்சு. <surprise> | |
| output: | |
| url: samples/sample4.wav | |
| license: apache-2.0 | |
| # Model Card for Indic-Mio | |
| <b>Indic-Mio</b> is an open-source Text-to-Speech (TTS) model that supports all <b>22 scheduled Indian languages and English</b>. Produces high-quality natural-sounding speech at <b>44kHz</b> with less than <b>0.1 RTF</b>. Zero-shot voice cloning supported via speaker embeddings in the codec. Also works well for code-mixed sentences. | |
| This model is a fine-tuned version of [Aratako/MioTTS-0.6B](https://huggingface.co/Aratako/MioTTS-0.6B) which uses [MioCodec](https://huggingface.co/Aratako/MioCodec-25Hz-24kHz) for speech tokenization. | |
| <!-- It has been trained using Transformers, Unsloth and [TRL](https://github.com/huggingface/trl). --> | |
| ## Prompting | |
| For emotion and style control, place the tags <b>at the end</b> of the sentence. | |
| For example: `मुझे यह फिल्म बहुत पसंद आई! <happy>` or `I am not sure if I can do this. <confused>` | |
| Tags for Indian languages: `<happy>`, `<sad>`, `<angry>`, `<disgust>`, `<fear>`, `<surprise>` <br> | |
| Tags for English: `<happy>`, `<sad>`, `<enunciated>`, `<confused>`, `<angry>`, `<whisper>` | |
| A word can be stressed by using asterisks(*) around it. For example: `No! I could *never* do it!` | |
| ## Inference | |
| <b>Approach 1: With MioTTS-Inference (recommended)</b> | |
| Install [vllm](https://github.com/vllm-project/vllm) and set up [MioTTS-Inference](https://github.com/Aratako/MioTTS-Inference). | |
| ```bash | |
| vllm serve SPRINGLab/Indic-Mio --gpu-memory-utilization 0.5 | |
| ``` | |
| ```bash | |
| cd MioTTS-Inference | |
| python run_server.py | |
| ``` | |
| ```bash | |
| python run_gradio.py | |
| ``` | |
| <b>Approach 2: Directly with Transformers</b> | |
| ```bash | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from miocodec import MioCodec | |
| import numpy as np | |
| import torch | |
| model_name = "SPRINGLab/Indic-Mio" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, torch_dtype=torch.bfloat16, device_map="cuda" | |
| ) | |
| text = "नमस्ते, आप कैसे हैं?" | |
| messages = [{"role": "user", "content": text}] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| output = model.generate( | |
| **inputs, | |
| max_new_tokens=1024, | |
| temperature=0.9, | |
| top_p=0.9, | |
| ) | |
| generated = output[0][inputs["input_ids"].shape[1]:] | |
| speech_offset = 151669 | |
| audio_codes = [t.item() - speech_offset for t in generated | |
| if speech_offset <= t.item() < speech_offset + 12800] | |
| # Convert audio_codes by decoding with MioCodec | |
| # audio_codes -> numpy array -> MioCodec decode -> wav | |
| codec = MioCodec.from_pretrained("Aratako/MioCodec-25Hz-24kHz") | |
| codes_tensor = torch.tensor([audio_codes], dtype=torch.long).unsqueeze(0) # [1, 1, T] | |
| wav = codec.decode(codes_tensor) # -> [1, 1, num_samples] | |
| import soundfile as sf | |
| sf.write("output.wav", wav.squeeze().cpu().numpy(), 44100) | |
| ``` | |
| ## Training | |
| This model was trained on a single NVIDIA A6000 ADA GPU in less than 6 hours. | |
| For Indian languages, IndicTTS, Rasa and Syspin datasets were used. For American English, LibriTTS and Expresso, while for Indian English, SPICOR dataset was used. | |
| ## Fine-tuning | |
| This model is robust yet flexible. You can fine-tune it on your own dataset for better performance on specific languages, accents, speakers, styles or emotions. Just a few steps of LoRA fine-tuning can significantly improve the performance for your target task. | |
| ## Citations | |
| In case you use this model, please cite this huggingface repository as follows: | |
| ```bibtex | |
| @misc{indic-mio-tts, | |
| title={Indic-Mio TTS}, | |
| author={Advait Joglekar}, | |
| year={2026}, | |
| publisher = {Hugging Face}, | |
| howpublished={\url{https://huggingface.co/SPRINGLab/Indic-Mio}}, | |
| } | |
| ``` |