Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: Audiobook Generator - English to 36 Languages
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: mit
π Audiobook Generator β English to 36 Languages
Paste or upload English text and generate a professionally narrated audiobook in any of 36 languages, powered by Alibaba's Qwen3.5-Omni-Plus.
Features
- Translation + Narration: Translates English text and generates expressive speech in the target language
- Direct Narration: Generate English audiobooks without translation
- 29 narrator voices: Male and female voices with different styles (cinematic, warm, dramatic, etc.)
- Smart text splitting: Handles long texts by splitting at sentence/paragraph boundaries
- MP3 output: Compressed for easy download and sharing
- Section pauses: Optional natural pauses between text sections
Setup
- Add your DashScope API key as a Space Secret:
- Settings β Secrets β New Secret
- Name:
DASHSCOPE_API_KEY - Value: your key (get one here)
Supported Languages (36)
β Core Languages (Best Quality)
English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
Extended Languages
Arabic, Bengali, Cantonese, Czech, Danish, Dutch, Filipino, Finnish, Greek, Hebrew, Hindi, Hungarian, Indonesian, Malay, Norwegian, Persian, Polish, Romanian, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese
How It Works
- Your English text is split into manageable chunks at sentence boundaries
- Each chunk is sent to
qwen3.5-omni-pluswith instructions to translate (if needed) and narrate - The model generates expressive speech with audiobook-quality narration
- All audio chunks are concatenated and converted to MP3
- Download your audiobook!
Limitations
- Processing time: ~30-60 seconds per ~1500 characters
- Extended languages may have variable voice quality compared to the core 10
- Very long texts (100k+ characters) may take significant time
- The model generates speech at its own pace, so timing won't match a human narrator exactly
Credits
- Model: Qwen3.5-Omni-Plus by Alibaba Cloud
- API: DashScope
- UI: Gradio