Spaces:

PlotweaverModel
/

AudioBook

Running

File size: 2,396 Bytes

545362c
f18a6ac
 
 
545362c
 
500a984
545362c
 
f18a6ac
545362c
 
f18a6ac

---
title: Audiobook Generator - English to 36 Languages
emoji: 📖
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.25.0"
app_file: app.py
pinned: false
license: mit
---

# 📖 Audiobook Generator — English to 36 Languages

Paste or upload English text and generate a professionally narrated audiobook in any of **36 languages**, powered by Alibaba's Qwen3.5-Omni-Plus.

## Features

- **Translation + Narration**: Translates English text and generates expressive speech in the target language
- **Direct Narration**: Generate English audiobooks without translation
- **29 narrator voices**: Male and female voices with different styles (cinematic, warm, dramatic, etc.)
- **Smart text splitting**: Handles long texts by splitting at sentence/paragraph boundaries
- **MP3 output**: Compressed for easy download and sharing
- **Section pauses**: Optional natural pauses between text sections

## Setup

1. Add your **DashScope API key** as a Space Secret:
   - Settings → Secrets → New Secret
   - Name: `DASHSCOPE_API_KEY`
   - Value: your key ([get one here](https://www.alibabacloud.com/help/en/model-studio/get-api-key))

## Supported Languages (36)

### ⭐ Core Languages (Best Quality)
English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian

### Extended Languages
Arabic, Bengali, Cantonese, Czech, Danish, Dutch, Filipino, Finnish, Greek, Hebrew, Hindi, Hungarian, Indonesian, Malay, Norwegian, Persian, Polish, Romanian, Swahili, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese

## How It Works

1. Your English text is split into manageable chunks at sentence boundaries
2. Each chunk is sent to `qwen3.5-omni-plus` with instructions to translate (if needed) and narrate
3. The model generates expressive speech with audiobook-quality narration
4. All audio chunks are concatenated and converted to MP3
5. Download your audiobook!

## Limitations

- Processing time: ~30-60 seconds per ~1500 characters
- Extended languages may have variable voice quality compared to the core 10
- Very long texts (100k+ characters) may take significant time
- The model generates speech at its own pace, so timing won't match a human narrator exactly

## Credits

- Model: [Qwen3.5-Omni-Plus](https://qwen.ai) by Alibaba Cloud
- API: [DashScope](https://www.alibabacloud.com/help/en/model-studio/)
- UI: [Gradio](https://gradio.app)