# Hands-on exercise

In this unit, we have explored text-to-speech audio task, talked about existing datasets, pretrained 
models and nuances of fine-tuning SpeechT5 for a new language. 

As you've seen, fine-tuning models for text-to-speech task can be challenging in low-resource scenarios. At the same time, 
evaluating text-to-speech models isn't easy either. 

For these reasons, this hands-on exercise will focus on practicing the skills rather than achieving a certain metric value. 

Your objective for this task is to fine-tune SpeechT5 on a dataset of your choosing. You have the freedom to select 
another language from the same `voxpopuli` dataset, or you can pick any other dataset listed in this unit.

Be mindful of the training data size! For training on a free tier GPU from Google Colab, we recommend limiting the training 
data to about 10-15 hours. 

Once you have completed the fine-tuning process, share your model by uploading it to the Hub. Make sure to tag your model 
as a `text-to-speech` model either with appropriate kwargs, or in the Hub UI.

Remember, the primary aim of this exercise is to provide you with ample practice, allowing you to refine your skills and 
gain a deeper understanding of text-to-speech audio tasks.