Update README.md

4157c08 verified 26 days ago

3.86 kB

	---
	license: cc-by-nc-4.0
	language:
	- zh
	- en
	tags:
	- speech
	- asr
	- sensevoice
	- paralinguistic
	- nonverbal-vocalization
	datasets:
	- NV-Bench
	- amphion/Emilia-NV
	- nonverbalspeech/nonverbalspeech38k
	- deepvk/NonverbalTTS
	- xunyi/SMIIP-NV
	pipeline_tag: automatic-speech-recognition
	metrics:
	- cer
	base_model:
	- FunAudioLLM/SenseVoiceSmall
	---

	# Multi-lingual NVASR

	Multi-lingual Nonverbal Vocalization Automatic Speech Recognition

	[![Demo Page](https://img.shields.io/badge/Demo-Page-blue)](https://nvbench.github.io)
	[![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-yellow)](https://huggingface.co/datasets/AnonyData/NV-Bench)
	[![Model](https://img.shields.io/badge/Model-HuggingFace-yellow)](https://huggingface.co/AnonyData/Multilingual-NVASR)

	Multi-lingual NVASR is a speech recognition model fine-tuned from [SenseVoice-Small](https://github.com/FunAudioLLM/SenseVoice) for transcribing both regular speech and nonverbal vocalizations (NVVs) with a unified paralinguistic label taxonomy. It is a core component of the [NV-Bench](https://nvbench.github.io) evaluation pipeline.

	## Highlights

	- 🗣️ Multi-lingual Support — Chinese (zh), English (en)
	- 🎯 NVV-Aware Transcription — Accurately transcribes nonverbal vocalizations (laughter, coughs, sighs, etc.) as structured tags within text
	- 📊 High-Quality General ASR — Maintains competitive CER on standard ASR benchmarks while significantly outperforming baselines on NVV-specific tasks
	- 🏷️ Unified Label Taxonomy — Consistent paralinguistic labels across all supported languages

	## NVV Taxonomy

	NVVs are organized into three functional levels:

	\| Function \| Categories \|
	\|----------\|------------\|
	\| Vegetative \| `[Cough]`, `[Sigh]`, `[Breathing]` \|
	\| Affect Burst \| `[Surprise-oh]`, `[Surprise-ah]`, `[Dissatisfaction-hnn]`, `[Laughter]` \|
	\| Conversational Grunt \| `[Uhm]`, `[Question-en/oh/ah/ei/huh]`, `[Confirmation-en]` \|

	> [!NOTE]
	> Mandarin supports 13 NVV categories; English supports 7 categories.

	## Usage

	### Quick Start with FunASR

	```python
	from funasr import AutoModel

	model = AutoModel(model="path/to/Multi-lingual-NVASR")

	# Single file inference
	res = model.generate(
	input="example/zh.mp3",
	language="auto",
	use_itn=True,
	)
	print(res[0]["text"])
	```

	## Evaluation Metrics

	Multi-lingual NVASR supports the following evaluation metrics used in the NV-Bench pipeline:

	\| Metric \| Description \|
	\|--------\|-------------\|
	\| OCER / OWER \| Overall Character/Word Error Rate (text + NVV tags) \|
	\| PCER / PWER \| Paralinguistic CER/WER (NVV tags only) \|
	\| CER / WER \| Text-only error rate (NVV tags removed) \|

	> Our NVASR model maintains high-quality general ASR while significantly outperforming baselines on NVV-specific tasks. — NV-Bench

	## File Structure

	```
	Multi-lingual NVASR/
	├── model.pt # Model weights (~2.8 GB)
	├── config.yaml # Model architecture configuration
	├── configuration.json # FunASR pipeline configuration
	├── am.mvn # Acoustic model mean-variance normalization
	├── paralingustic_tokenizer.model # SentencePiece tokenizer with NVV vocabulary
	├── example/ # Example audio files
	│ ├── zh.mp3 # Chinese example
	│ ├── en.mp3 # English example
	```

	## Related Resources

	- NV-Bench Project Page: [https://nvbench.github.io](https://nvbench.github.io)
	- NV-Bench Dataset: [Hugging Face](https://huggingface.co/datasets/AnonyData/NV-Bench)
	- SenseVoice: [GitHub](https://github.com/FunAudioLLM/SenseVoice)

	## Citation

	If you use this model, please cite:

	```bibtex
	Coming soon
	```

	## License

	This project is licensed under the [CC BY-NC-4.0 License](LICENSE).