--- language: - hi - bn - mr - te - kn - mai - as - brx - doi - gu - ml - pa - ta - ne - sa - sat - sd - or - mni - ks - kok - ur - en base_model: Aratako/MioTTS-0.6B library_name: transformers model_name: Indic-Mio pipeline_tag: text-to-speech tags: - speech - tts - voice datasets: - ai4bharat/Rasa - mythicinfinity/libritts_r - ylacombe/expresso widget: - text: >- प्लान तो बढ़िया है, but wait... Have you checked the hotel bookings? Last minute पे रूम मिलना is next to impossible on weekends. output: url: samples/sample1.wav - text: >- The rain hammered against the cold glass as Detective Morgan slammed the folder onto the table. 'I know you were there that night,' she said, her voice barely above a whisper. 'The question is — what did you see?' output: url: samples/sample2.wav - text: >- જ્યારે પણ મને તેની સખત જરૂર હોય ત્યારે આ દુકાનમાં મદદ કરવા માટે ક્યારેય કોઈ હાજર નથી હોતું. output: url: samples/sample3.wav - text: இந்த கோயில்லயா உங்க கல்யாணம் நடந்துச்சு. output: url: samples/sample4.wav license: apache-2.0 --- # Model Card for Indic-Mio Indic-Mio is an open-source Text-to-Speech (TTS) model that supports all 22 scheduled Indian languages and English. Produces high-quality natural-sounding speech at 44kHz with less than 0.1 RTF. Zero-shot voice cloning supported via speaker embeddings in the codec. Also works well for code-mixed sentences. This model is a fine-tuned version of [Aratako/MioTTS-0.6B](https://huggingface.co/Aratako/MioTTS-0.6B) which uses [MioCodec](https://huggingface.co/Aratako/MioCodec-25Hz-24kHz) for speech tokenization. ## Prompting For emotion and style control, place the tags at the end of the sentence. For example: `मुझे यह फिल्म बहुत पसंद आई! ` or `I am not sure if I can do this. ` Tags for Indian languages: ``, ``, ``, ``, ``, ``
Tags for English: ``, ``, ``, ``, ``, `` A word can be stressed by using asterisks(*) around it. For example: `No! I could *never* do it!` ## Inference Approach 1: With MioTTS-Inference (recommended) Install [vllm](https://github.com/vllm-project/vllm) and set up [MioTTS-Inference](https://github.com/Aratako/MioTTS-Inference). ```bash vllm serve SPRINGLab/Indic-Mio --gpu-memory-utilization 0.5 ``` ```bash cd MioTTS-Inference python run_server.py ``` ```bash python run_gradio.py ``` Approach 2: Directly with Transformers ```bash from transformers import AutoTokenizer, AutoModelForCausalLM from miocodec import MioCodec import numpy as np import torch model_name = "SPRINGLab/Indic-Mio" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="cuda" ) text = "नमस्ते, आप कैसे हैं?" messages = [{"role": "user", "content": text}] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) output = model.generate( **inputs, max_new_tokens=1024, temperature=0.9, top_p=0.9, ) generated = output[0][inputs["input_ids"].shape[1]:] speech_offset = 151669 audio_codes = [t.item() - speech_offset for t in generated if speech_offset <= t.item() < speech_offset + 12800] # Convert audio_codes by decoding with MioCodec # audio_codes -> numpy array -> MioCodec decode -> wav codec = MioCodec.from_pretrained("Aratako/MioCodec-25Hz-24kHz") codes_tensor = torch.tensor([audio_codes], dtype=torch.long).unsqueeze(0) # [1, 1, T] wav = codec.decode(codes_tensor) # -> [1, 1, num_samples] import soundfile as sf sf.write("output.wav", wav.squeeze().cpu().numpy(), 44100) ``` ## Training This model was trained on a single NVIDIA A6000 ADA GPU in less than 6 hours. For Indian languages, IndicTTS, Rasa and Syspin datasets were used. For American English, LibriTTS and Expresso, while for Indian English, SPICOR dataset was used. ## Fine-tuning This model is robust yet flexible. You can fine-tune it on your own dataset for better performance on specific languages, accents, speakers, styles or emotions. Just a few steps of LoRA fine-tuning can significantly improve the performance for your target task. ## Citations In case you use this model, please cite this huggingface repository as follows: ```bibtex @misc{indic-mio-tts, title={Indic-Mio TTS}, author={Advait Joglekar}, year={2026}, publisher = {Hugging Face}, howpublished={\url{https://huggingface.co/SPRINGLab/Indic-Mio}}, } ```