trysem commited on
Commit
cc505bc
·
verified ·
1 Parent(s): 2f73923

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -2
README.md CHANGED
@@ -2,6 +2,66 @@
2
  license: mit
3
  language:
4
  - ml
5
- library_name: nemo
6
  pipeline_tag: automatic-speech-recognition
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  language:
4
  - ml
 
5
  pipeline_tag: automatic-speech-recognition
6
+ library_name: nemo
7
+ ---
8
+ ## IndicConformer
9
+
10
+ IndicConformer is a Hybrid CTC-RNNT conformer ASR(Automatic Speech Recognition) model.
11
+
12
+ ### Language
13
+
14
+ Malayalam
15
+
16
+ ### Input
17
+
18
+ This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
19
+
20
+ ### Output
21
+
22
+ This model provides transcribed speech as a string for a given audio sample.
23
+
24
+ ## Model Architecture
25
+
26
+ This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
27
+ 512 as the model dimension.
28
+
29
+
30
+ ## AI4Bharat NeMo:
31
+
32
+ To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
33
+ ```
34
+ git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
35
+ ```
36
+
37
+ ## Usage
38
+ Download and load the model from Huggingface.
39
+ ```
40
+ import torch
41
+ import nemo.collections.asr as nemo_asr
42
+
43
+ model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/indicconformer_stt_ml_hybrid_rnnt_large")
44
+
45
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
46
+ model.freeze() # inference mode
47
+ model = model.to(device) # transfer model to device
48
+ ```
49
+ Get an audio file ready by running the command shown below in your terminal. This will convert the audio to 16000 Hz and monochannel.
50
+ ```
51
+ ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav
52
+ ```
53
+
54
+
55
+ ### Inference using CTC decoder
56
+ ```
57
+ model.cur_decoder = "ctc"
58
+ ctc_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1,logprobs=False, language_id='ml')[0]
59
+ print(ctc_text)
60
+ ```
61
+
62
+ ### Inference using RNNT decoder
63
+ ```
64
+ model.cur_decoder = "rnnt"
65
+ rnnt_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1, language_id='ml')[0]
66
+ print(rnnt_text)
67
+ ```