trysem
/

conformer-ml

Automatic Speech Recognition

Model card Files Files and versions

trysem commited on Feb 20

Commit

cc505bc

·

verified ·

1 Parent(s): 2f73923

Update README.md

Files changed (1) hide show

README.md +62 -2

README.md CHANGED Viewed

@@ -2,6 +2,66 @@
 license: mit
 language:
 - ml
-library_name: nemo
 pipeline_tag: automatic-speech-recognition
----

 license: mit
 language:
 - ml
 pipeline_tag: automatic-speech-recognition
+library_name: nemo
+---
+## IndicConformer
+  IndicConformer is a Hybrid CTC-RNNT conformer ASR(Automatic Speech Recognition) model.
+  ### Language
+  Malayalam
+  ### Input
+  This model accepts 16000 KHz Mono-channel Audio (wav files) as input.
+  ### Output
+  This model provides transcribed speech as a string for a given audio sample.
+  ## Model Architecture
+  This model is a conformer-Large model, consisting of 120M parameters, as the encoder, with a hybrid CTC-RNNT decoder. The model has 17 conformer blocks with
+  512 as the model dimension.
+  ## AI4Bharat NeMo:
+  To load, train, fine-tune or play with the model you will need to install [AI4Bharat NeMo](https://github.com/AI4Bharat/NeMo). We recommend you install it using the command shown below
+  ```
+  git clone https://github.com/AI4Bharat/NeMo.git && cd NeMo && git checkout nemo-v2 && bash reinstall.sh
+  ```
+  ## Usage
+  Download and load the model from Huggingface.
+  ```
+  import torch
+  import nemo.collections.asr as nemo_asr
+  model = nemo_asr.models.ASRModel.from_pretrained("ai4bharat/indicconformer_stt_ml_hybrid_rnnt_large")
+  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+  model.freeze() # inference mode
+  model = model.to(device) # transfer model to device
+  ```
+  Get an audio file ready by running the command shown below in your terminal. This will convert the audio to 16000 Hz and monochannel.
+  ```
+  ffmpeg -i sample_audio.wav -ac 1 -ar 16000 sample_audio_infer_ready.wav
+  ```
+  ### Inference using CTC decoder
+  ```
+  model.cur_decoder = "ctc"
+  ctc_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1,logprobs=False, language_id='ml')[0]
+  print(ctc_text)
+  ```
+  ### Inference using RNNT decoder
+  ```
+  model.cur_decoder = "rnnt"
+  rnnt_text = model.transcribe(['sample_audio_infer_ready.wav'], batch_size=1, language_id='ml')[0]
+  print(rnnt_text)
+  ```