AudioBook / README.md
PlotweaverModel's picture
update
500a984 verified
---
title: Audiobook Generator - English to 36 Languages
emoji: πŸ“–
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.25.0"
app_file: app.py
pinned: false
license: mit
---
# πŸ“– Audiobook Generator β€” English to 36 Languages
Paste or upload English text and generate a professionally narrated audiobook in any of **36 languages**, powered by Alibaba's Qwen3.5-Omni-Plus.
## Features
- **Translation + Narration**: Translates English text and generates expressive speech in the target language
- **Direct Narration**: Generate English audiobooks without translation
- **29 narrator voices**: Male and female voices with different styles (cinematic, warm, dramatic, etc.)
- **Smart text splitting**: Handles long texts by splitting at sentence/paragraph boundaries
- **MP3 output**: Compressed for easy download and sharing
- **Section pauses**: Optional natural pauses between text sections
## Setup
1. Add your **DashScope API key** as a Space Secret:
- Settings β†’ Secrets β†’ New Secret
- Name: `DASHSCOPE_API_KEY`
- Value: your key ([get one here](https://www.alibabacloud.com/help/en/model-studio/get-api-key))
## Supported Languages (36)
### ⭐ Core Languages (Best Quality)
English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
### Extended Languages
Arabic, Bengali, Cantonese, Czech, Danish, Dutch, Filipino, Finnish, Greek, Hebrew, Hindi, Hungarian, Indonesian, Malay, Norwegian, Persian, Polish, Romanian, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese
## How It Works
1. Your English text is split into manageable chunks at sentence boundaries
2. Each chunk is sent to `qwen3.5-omni-plus` with instructions to translate (if needed) and narrate
3. The model generates expressive speech with audiobook-quality narration
4. All audio chunks are concatenated and converted to MP3
5. Download your audiobook!
## Limitations
- Processing time: ~30-60 seconds per ~1500 characters
- Extended languages may have variable voice quality compared to the core 10
- Very long texts (100k+ characters) may take significant time
- The model generates speech at its own pace, so timing won't match a human narrator exactly
## Credits
- Model: [Qwen3.5-Omni-Plus](https://qwen.ai) by Alibaba Cloud
- API: [DashScope](https://www.alibabacloud.com/help/en/model-studio/)
- UI: [Gradio](https://gradio.app)