Add pipeline_tag, library_name and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +15 -12
README.md CHANGED
@@ -1,6 +1,14 @@
1
  ---
 
 
 
2
  language:
3
  - sk
 
 
 
 
 
4
  tags:
5
  - speech
6
  - asr
@@ -9,11 +17,6 @@ tags:
9
  - parliament
10
  - legal
11
  - politics
12
- base_model: openai/whisper-medium
13
- datasets:
14
- - erikbozik/slovak-plenary-asr-corpus
15
- metrics:
16
- - wer
17
  model-index:
18
  - name: whisper-medium-sk
19
  results:
@@ -24,9 +27,9 @@ model-index:
24
  name: Common Voice 21 (Slovak test set)
25
  type: common_voice
26
  metrics:
27
- - name: WER
28
- type: wer
29
  value: 18
 
30
  - task:
31
  type: automatic-speech-recognition
32
  name: Automatic Speech Recognition
@@ -34,15 +37,15 @@ model-index:
34
  name: FLEURS (Slovak test set)
35
  type: fleurs
36
  metrics:
37
- - name: WER
38
- type: wer
39
  value: 7.6
40
- license: mit
41
  ---
42
 
43
  # Whisper Medium — Fine-tuned on SloPalSpeech
44
 
45
- This model is a fine-tuned version of [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium).
 
46
  It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
47
 
48
  - **Language:** Slovak
@@ -73,7 +76,7 @@ It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/dat
73
  - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
74
 
75
  ## 📝 Citation & Paper
76
- For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270). If you use this model in your work, please cite it as:
77
  ```bibtex
78
  @misc{božík2025slopalspeech2800hourslovakspeech,
79
  title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},
 
1
  ---
2
+ base_model: openai/whisper-medium
3
+ datasets:
4
+ - erikbozik/slovak-plenary-asr-corpus
5
  language:
6
  - sk
7
+ license: mit
8
+ metrics:
9
+ - wer
10
+ library_name: transformers
11
+ pipeline_tag: automatic-speech-recognition
12
  tags:
13
  - speech
14
  - asr
 
17
  - parliament
18
  - legal
19
  - politics
 
 
 
 
 
20
  model-index:
21
  - name: whisper-medium-sk
22
  results:
 
27
  name: Common Voice 21 (Slovak test set)
28
  type: common_voice
29
  metrics:
30
+ - type: wer
 
31
  value: 18
32
+ name: WER
33
  - task:
34
  type: automatic-speech-recognition
35
  name: Automatic Speech Recognition
 
37
  name: FLEURS (Slovak test set)
38
  type: fleurs
39
  metrics:
40
+ - type: wer
 
41
  value: 7.6
42
+ name: WER
43
  ---
44
 
45
  # Whisper Medium — Fine-tuned on SloPalSpeech
46
 
47
+ This model is a fine-tuned version of [`openai/whisper-medium`](https://huggingface.co/openai/whisper-medium), presented in the paper [SloPal: A 60-Million-Word Slovak Parliamentary Corpus with Aligned Speech and Fine-Tuned ASR Models](https://huggingface.co/papers/2509.19270).
48
+
49
  It is adapted for **Slovak ASR** using [SloPalSpeech](https://huggingface.co/datasets/erikbozik/slovak-plenary-asr-corpus): **2,806 hours** of aligned, ≤30 s speech–text pairs from official plenary sessions of the **Slovak National Council**.
50
 
51
  - **Language:** Slovak
 
76
  - Multilingual performance is not guaranteed (full-parameter finetuning emphasized Slovak).
77
 
78
  ## 📝 Citation & Paper
79
+ For more details, please see our paper on [arXiv](https://arxiv.org/abs/2509.19270) or the [Hugging Face paper page](https://huggingface.co/papers/2509.19270). If you use this model in your work, please cite it as:
80
  ```bibtex
81
  @misc{božík2025slopalspeech2800hourslovakspeech,
82
  title={SloPalSpeech: A 2,800-Hour Slovak Speech Corpus from Parliamentary Data},