CAM++
Speaker Verification & Diarization β identify and distinguish speakers in audio.
CAM++ is a speaker embedding model used for speaker verification (is this the same person?) and speaker diarization (who spoke when?).
Quick Start
from funasr import AutoModel
# Speaker diarization as part of ASR pipeline
model = AutoModel(
model="funasr/paraformer-zh",
hub="hf",
vad_model="funasr/fsmn-vad",
punc_model="funasr/ct-punc",
spk_model="funasr/campplus",
device="cuda",
)
result = model.generate(input="meeting.wav")
# Each sentence has a speaker label
for sentence in result[0]["sentence_info"]:
print(f"[Speaker {sentence['spk']}] {sentence['text']}")
Features
- Speaker embedding extraction
- Speaker verification (same/different speaker)
- Speaker diarization (multi-speaker segmentation)
- Works with FunASR pipeline via
spk_modelparameter
Model Details
| Property | Value |
|---|---|
| Architecture | CAM++ (Class-Aware Multi-scale) |
| Embedding Dim | 192 |
| Sample Rate | 16kHz |
Links
- GitHub: FunASR
- Docs: modelscope.github.io/FunASR
- Downloads last month
- 783