utter-project
/

hutter-12-3rd-base

Model card Files Files and versions

hutter-12-3rd-base / README.md

mzboito's picture

Update README.md

3a35ad0 verified almost 2 years ago

|

history blame contribute delete

979 Bytes

	---
	license: cc-by-nc-4.0
	datasets:
	- mozilla-foundation/common_voice_11_0
	language:
	- fr
	- es
	- pt
	- da
	- de
	- nl
	- fy
	- zh
	- ja
	- ar
	- sw
	- gn
	library_name: fairseq
	---

	HUTTER-12: H(uBERT) UTTER model covering 12 languages.

	* Total training hours: 1,622 from Romance (French: 300h, Spanish: 300h, Portuguese: 102.3h), West-Germanic (Danish: 3.5h, German: 300h, Dutch: 72.1h, Frisian: 41.2h) and other languages (Chinese (zh-CN): 104.6h, Japanese: 37h, Arabic: 61h, Swahili 300h, Guaraní: 0.4h)
	* Number of updates: 400K
	* Number of iterations: 3
	* Clustering approach: mini-batch K-means (100% of the data)
	* Dataset: CommonVoice v13

	# Funding

	<img src="https://cdn-uploads.huggingface.co/production/uploads/62262e19d36494a6f743a28d/HbzC1C-uHe25ewTy2wyoK.png" width=7% height=7%>
	This is an output of the European Project UTTER (Unified Transcription and Translation for Extended Reality) under grant number 101070631. For more information go to https://he-utter.eu/