Spaces:

PlotweaverModel
/

AudioBook

Running

App Files Files Community

AudioBook / README.md

PlotweaverModel

update

500a984 verified 7 days ago

preview code

raw

history blame contribute delete

2.4 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Audiobook Generator - English to 36 Languages
emoji: 📖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.25.0
app_file: app.py
pinned: false
license: mit

📖 Audiobook Generator — English to 36 Languages

Paste or upload English text and generate a professionally narrated audiobook in any of 36 languages, powered by Alibaba's Qwen3.5-Omni-Plus.

Features

Translation + Narration: Translates English text and generates expressive speech in the target language
Direct Narration: Generate English audiobooks without translation
29 narrator voices: Male and female voices with different styles (cinematic, warm, dramatic, etc.)
Smart text splitting: Handles long texts by splitting at sentence/paragraph boundaries
MP3 output: Compressed for easy download and sharing
Section pauses: Optional natural pauses between text sections

Setup

Add your DashScope API key as a Space Secret:
- Settings → Secrets → New Secret
- Name: DASHSCOPE_API_KEY
- Value: your key (get one here)

Supported Languages (36)

⭐ Core Languages (Best Quality)

English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian

Extended Languages

Arabic, Bengali, Cantonese, Czech, Danish, Dutch, Filipino, Finnish, Greek, Hebrew, Hindi, Hungarian, Indonesian, Malay, Norwegian, Persian, Polish, Romanian, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese

How It Works

Your English text is split into manageable chunks at sentence boundaries
Each chunk is sent to qwen3.5-omni-plus with instructions to translate (if needed) and narrate
The model generates expressive speech with audiobook-quality narration
All audio chunks are concatenated and converted to MP3
Download your audiobook!

Limitations

Processing time: ~30-60 seconds per ~1500 characters
Extended languages may have variable voice quality compared to the core 10
Very long texts (100k+ characters) may take significant time
The model generates speech at its own pace, so timing won't match a human narrator exactly

Credits

Model: Qwen3.5-Omni-Plus by Alibaba Cloud
API: DashScope
UI: Gradio