Spaces:
Running
Running
| title: Audiobook Generator - English to 36 Languages | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "5.25.0" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # π Audiobook Generator β English to 36 Languages | |
| Paste or upload English text and generate a professionally narrated audiobook in any of **36 languages**, powered by Alibaba's Qwen3.5-Omni-Plus. | |
| ## Features | |
| - **Translation + Narration**: Translates English text and generates expressive speech in the target language | |
| - **Direct Narration**: Generate English audiobooks without translation | |
| - **29 narrator voices**: Male and female voices with different styles (cinematic, warm, dramatic, etc.) | |
| - **Smart text splitting**: Handles long texts by splitting at sentence/paragraph boundaries | |
| - **MP3 output**: Compressed for easy download and sharing | |
| - **Section pauses**: Optional natural pauses between text sections | |
| ## Setup | |
| 1. Add your **DashScope API key** as a Space Secret: | |
| - Settings β Secrets β New Secret | |
| - Name: `DASHSCOPE_API_KEY` | |
| - Value: your key ([get one here](https://www.alibabacloud.com/help/en/model-studio/get-api-key)) | |
| ## Supported Languages (36) | |
| ### β Core Languages (Best Quality) | |
| English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | |
| ### Extended Languages | |
| Arabic, Bengali, Cantonese, Czech, Danish, Dutch, Filipino, Finnish, Greek, Hebrew, Hindi, Hungarian, Indonesian, Malay, Norwegian, Persian, Polish, Romanian, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese | |
| ## How It Works | |
| 1. Your English text is split into manageable chunks at sentence boundaries | |
| 2. Each chunk is sent to `qwen3.5-omni-plus` with instructions to translate (if needed) and narrate | |
| 3. The model generates expressive speech with audiobook-quality narration | |
| 4. All audio chunks are concatenated and converted to MP3 | |
| 5. Download your audiobook! | |
| ## Limitations | |
| - Processing time: ~30-60 seconds per ~1500 characters | |
| - Extended languages may have variable voice quality compared to the core 10 | |
| - Very long texts (100k+ characters) may take significant time | |
| - The model generates speech at its own pace, so timing won't match a human narrator exactly | |
| ## Credits | |
| - Model: [Qwen3.5-Omni-Plus](https://qwen.ai) by Alibaba Cloud | |
| - API: [DashScope](https://www.alibabacloud.com/help/en/model-studio/) | |
| - UI: [Gradio](https://gradio.app) | |