| | --- |
| | license: mit |
| | language: |
| | - ru |
| | library_name: pyannote-audio |
| | tags: |
| | - code |
| | --- |
| | |
| | # Segmentation model |
| |
|
| | This model was trained on AMI-MixHeadset and my own synthetic dataset of Russian speech. |
| |
|
| | Training time: 5 hours on GTX 3060 |
| |
|
| | This model can be used for diarization model from [pyannote/speaker-diarization](https://huggingface.co/pyannote/speaker-diarization) |
| |
|
| | | Benchmark | DER% | |
| | | --------- |------| |
| | | [AMI (*headset mix,*](https://groups.inf.ed.ac.uk/ami/corpus/) [*only_words*)](https://github.com/BUTSpeechFIT/AMI-diarization-setup) | 38.8 | |
| |
|
| | ## Usage example |
| |
|
| | ```python |
| | import yaml |
| | from yaml.loader import SafeLoader |
| | |
| | import torch |
| | from pyannote.audio import Model |
| | from pyannote.audio.pipelines import SpeakerDiarization |
| | |
| | |
| | segm_model = torch.load('model/segm_model.pth', map_location=torch.device('cpu')) |
| | embed_model = Model.from_pretrained("pyannote/embedding", use_auth_token='ACCESS_TOKEN_GOES_HERE') |
| | diar_pipeline = SpeakerDiarization( |
| | segmentation=segm_model, |
| | segmentation_batch_size=16, |
| | clustering="AgglomerativeClustering", |
| | embedding=embed_model |
| | ) |
| | |
| | with open('model/config.yaml', 'r') as f: |
| | diar_config = yaml.load(f, Loader=SafeLoader) |
| | diar_pipeline.instantiate(diar_config) |
| | |
| | annotation = diar_pipeline('audio.wav') |
| | ``` |
| |
|