| | --- |
| | license: apache-2.0 |
| | base_model: |
| | - coqui/XTTS-v2 |
| | --- |
| | # Auralis π |
| |
|
| | ## Model Details π οΈ |
| |
|
| | **Model Name:** Auralis |
| |
|
| | **Model Architecture:** Based on [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) |
| |
|
| | **License:** |
| | - license: Apache 2.0 |
| | - base_model: XTTS-v2 Components [Coqui AI License](https://coqui.ai/cpml) |
| | |
| | **Language Support:** English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese (Simplified), Hungarian, Korean, Japanese, Hindi |
| | |
| | **Developed by:** [AstraMind.ai](https://www.astramind.ai) |
| | |
| | **GitHub:** [AstraMind AI](https://github.com/astramind-ai/Auralis/tree/main) |
| | |
| | **Primary Use Case:** Text-to-Speech (TTS) generation for real-world applications, including books, dialogues, and multilingual tasks. |
| | |
| | --- |
| | |
| | ## Model Description π |
| | |
| | Auralis transforms text into natural, high-quality speech with exceptional speed and scalability. It is powered by [Coqui XTTS-v2](https://huggingface.co/coqui/XTTS-v2) and optimized for both consumer-grade and high-performance GPUs. Auralis is designed to meet real-world needs like long-text processing, voice cloning, and concurrent request handling. |
| | |
| | ### Key Features: |
| | - **Warp-Speed Processing:** Generate speech for an entire novel (e.g., Harry Potter) in ~10 minutes. |
| | - **Hardware Friendly:** Requires <10GB VRAM on a single NVIDIA RTX 3090. |
| | - **Scalable:** Handles multiple requests simultaneously. |
| | - **Streaming:** Seamlessly processes long texts in a streaming format. |
| | - **Custom Voices:** Enables voice cloning from short reference audio. |
| | |
| | --- |
| | |
| | ## Quick Start β |
| | |
| | ```python |
| | from auralis import TTS, TTSRequest |
| | |
| | # Initialize the model |
| | tts = TTS().from_pretrained("AstraMindAI/xtts2-gpt") |
| |
|
| | # Create a TTS request |
| | request = TTSRequest( |
| | text="Hello Earth! This is Auralis speaking.", |
| | speaker_files=["reference.wav"] |
| | ) |
| | |
| | # Generate speech |
| | output = tts.generate_speech(request) |
| | output.save("output.wav") |
| | ``` |
| | |
| | --- |
| | |
| | ## Ebook Generation π |
| | |
| | Auralis converting ebooks into audio formats at lightning speed. For Python script, check out [ebook_audio_generator.py](https://github.com/astramind-ai/Auralis/blob/main/examples/vocalize_a_ebook.py). |
| | |
| | ```python |
| | def process_book(chapter_file: str, speaker_file: str): |
| | # Read chapter |
| | with open(chapter_file, 'r') as f: |
| | chapter = f.read() |
| | |
| | # You can pass the whole book, auralis will take care of splitting |
| | |
| | request = TTSRequest( |
| | text=chapter, |
| | speaker_files=[speaker_file], |
| | audio_config=AudioPreprocessingConfig( |
| | enhance_speech=True, |
| | normalize=True |
| | ) |
| | ) |
| | |
| | output = tts.generate_speech(request) |
| | |
| | output.play() |
| | output.save("chapter_output.wav") |
| | |
| | # Example usage |
| | process_book("chapter1.txt", "reference_voice.wav") |
| | ``` |
| | |
| | --- |
| | |
| | ## Intended Use π |
| | |
| | Auralis is designed for: |
| | - **Content Creators:** Generate audiobooks, podcasts, or voiceovers. |
| | - **Developers:** Integrate TTS into applications via a simple Python API. |
| | - **Accessibility**: Providing audio versions of digital content for people with visual or reading difficulties. |
| | - **Multilingual Scenarios:** Convert text to speech in multiple supported languages. |
| | |
| | --- |
| | |
| | ## Performance π |
| | |
| | **Benchmarks on NVIDIA RTX 3090:** |
| | - Short phrases (<100 characters): ~1 second |
| | - Medium texts (<1,000 characters): ~5-10 seconds |
| | - Full books (~100,000 characters): ~10 minutes |
| | |
| | **Memory Usage:** |
| | - Base VRAM: ~4GB |
| | - Peak VRAM: ~10GB |
| | |
| | --- |
| | |
| | ## Model Features πΈ |
| | |
| | 1. **Speed & Efficiency:** |
| | - Smart batching for rapid processing of long texts. |
| | - Memory-optimized for consumer GPUs. |
| | |
| | 2. **Easy Integration:** |
| | - Python API with support for synchronous and asynchronous workflows. |
| | - Streaming mode for continuous playback during generation. |
| | |
| | 3. **Audio Quality Enhancements:** |
| | - Background noise reduction. |
| | - Voice clarity and volume normalization. |
| | - Customizable audio preprocessing. |
| | |
| | 4. **Multilingual Support:** |
| | - Automatic language detection. |
| | - High-quality speech in 15+ languages. |
| | |
| | 5. **Customization:** |
| | - Voice cloning using short reference clips. |
| | - Adjustable parameters for tone, pacing, and language. |
| | |
| | --- |
| | |
| | ## Limitations & Ethical Considerations β οΈ |
| | |
| | - **Voice Cloning Risks:** Auralis supports voice cloning, which may raise ethical concerns about misuse. Use responsibly and ensure proper consent. |
| | - **Accent Limitations:** While robust for many languages, accents and intonations may vary based on the input. |
| | |
| | --- |
| | |
| | ## Citation π |
| | |
| | If you use Auralis in your research or projects, please cite: |
| | |
| | ```bibtex |
| | @misc{auralis2024, |
| | author = {AstraMind AI}, |
| | title = {Auralis: High-Performance Text-to-Speech Engine}, |
| | year = {2024}, |
| | url = {https://huggingface.co/AstraMindAI/auralis} |
| | } |
| | ``` |