Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published 4 days ago • 29
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published 6 days ago • 25
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 4 days ago • 38
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation Paper • 2605.19833 • Published 14 days ago • 131
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training Paper • 2604.14932 • Published Apr 16 • 11
Paused Agents Featured 1.94k Qwen3-TTS Demo 🎙 1.94k Generate custom speech from text, voice descriptions, or samples
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Paper • 2504.20630 • Published Apr 29, 2025 • 9
ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting Paper • 2504.20630 • Published Apr 29, 2025 • 9
Versatile Framework for Song Generation with Prompt-based Control Paper • 2504.19062 • Published Apr 27, 2025 • 6