| | --- |
| | language: |
| | - he |
| | pipeline_tag: fill-mask |
| | datasets: |
| | - HeNLP/HeDC4 |
| | --- |
| | |
| |
|
| | ## Hebrew Language Model for Long Documents |
| |
|
| | State-of-the-art Longformer language model for Hebrew. |
| |
|
| | #### How to use |
| |
|
| | ```python |
| | from transformers import AutoModelForMaskedLM, AutoTokenizer |
| | |
| | tokenizer = AutoTokenizer.from_pretrained('HeNLP/LongHeRo') |
| | model = AutoModelForMaskedLM.from_pretrained('HeNLP/LongHeRo') |
| | |
| | # Tokenization Example: |
| | # Tokenizing |
| | tokenized_string = tokenizer('ืฉืืื ืืืืื') |
| | |
| | # Decoding |
| | decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True) |
| | ``` |
| |
|
| |
|
| | ### Citing |
| |
|
| | If you use LongHeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077). |
| | ``` |
| | @article{shalumov2023hero, |
| | title={HeRo: RoBERTa and Longformer Hebrew Language Models}, |
| | author={Vitaly Shalumov and Harel Haskey}, |
| | year={2023}, |
| | journal={arXiv:2304.11077}, |
| | } |
| | ``` |