BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications Paper • 2602.09970 • Published 7 days ago • 1
RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness Paper • 2302.09437 • Published Feb 18, 2023
Wav2Gloss: Generating Interlinear Glossed Text from Speech Paper • 2403.13169 • Published Mar 19, 2024
On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models Paper • 2406.09282 • Published Jun 13, 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration Paper • 2409.09506 • Published Sep 14, 2024 • 4
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder Paper • 2507.14129 • Published Jul 18, 2025 • 11
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28, 2025 • 4
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28, 2025 • 4
PRiSM: Benchmarking Phone Realization in Speech Models Paper • 2601.14046 • Published 28 days ago • 6
PRiSM: Benchmarking Phone Realization in Speech Models Paper • 2601.14046 • Published 28 days ago • 6
PRiSM: Benchmarking Phone Realization in Speech Models Paper • 2601.14046 • Published 28 days ago • 6
Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects Paper • 2601.07274 • Published Jan 12 • 1
On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation Paper • 2601.06329 • Published Jan 9 • 2
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate Paper • 2304.02541 • Published Apr 5, 2023 • 2
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks Paper • 2411.05361 • Published Nov 8, 2024 • 5
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model Paper • 2510.24992 • Published Oct 28, 2025 • 4
Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement Paper • 2510.23141 • Published Oct 27, 2025 • 5