Instructions to use mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh")

# Load model directly
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh

SGLang

How to use mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh with Docker Model Runner:
```
docker model run hf.co/mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh
```

Model Card for Ancient Greek to Polish Interlinear Translation Model

This model performs interlinear translation from Ancient Greek to Polish, maintaining word-level alignment between source and target texts.

You can find the source code used for training this and other models trained as part of this project in the GitHub repository.

Model Details

Model Description

Developed By: Maciej Rapacz, AGH University of Kraków
Model Type: MorphT5AutoForConditionalGeneration
Base Model: mT5-large
Tokenizer: mT5
Language(s): Ancient Greek (source) → Polish (target)
License: CC BY-NC-SA 4.0
Tag Set: BH (Bible Hub)
Text Preprocessing: Diacritics
Morphological Encoding: emb-auto

Model Performance

BLEU Score: 59.04
SemScore: 0.93

Model Sources

Repository: https://github.com/mrapacz/loreslm-interlinear-translation
Paper: https://aclanthology.org/2025.loreslm-1.11/

Usage Example

Note: This model uses a modification of T5-family models that includes dedicated embedding layers for encoding morphological information. To load these models, install the morpht5 package:
pip install morpht5

>>> from morpht5 import MorphT5AutoForConditionalGeneration, MorphT5Tokenizer
>>> text = ['Λέγει', 'αὐτῷ', 'ὁ', 'Ἰησοῦς', 'Ἔγειρε', 'ἆρον', 'τὸν', 'κράβαττόν', 'σου', 'καὶ', 'περιπάτει']
>>> tags = ['V-PIA-3S', 'PPro-DM3S', 'Art-NMS', 'N-NMS', 'V-PMA-2S', 'V-AMA-2S', 'Art-AMS', 'N-AMS', 'PPro-G2S', 'Conj', 'V-PMA-2S']
>>> tokenizer = MorphT5Tokenizer.from_pretrained("mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh")
>>> inputs = tokenizer(
        text=text,
        morph_tags=tags,
        return_tensors="pt"
    )
>>> model = MorphT5AutoForConditionalGeneration.from_pretrained("mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh")
>>> outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        early_stopping=True,
    )
>>> decoded = tokenizer.decode(outputs[0], skip_special_tokens=True, keep_block_separator=True)
>>> decoded = decoded.replace(tokenizer.target_block_separator_token, " | ")
>>> decoded
'Mówi  |  mu  |  -  |  Jezus  |  wstawaj  |  weź  |  -  |  matę  |  swoją  |  i  |  chodź'

Citation

If you use this model, please cite the following paper:

@inproceedings{rapacz-smywinski-pohl-2025-low,
    title = "Low-Resource Interlinear Translation: Morphology-Enhanced Neural Models for {A}ncient {G}reek",
    author = "Rapacz, Maciej  and
      Smywi{\'n}ski-Pohl, Aleksander",
    editor = "Hettiarachchi, Hansi  and
      Ranasinghe, Tharindu  and
      Rayson, Paul  and
      Mitkov, Ruslan  and
      Gaber, Mohamed  and
      Premasiri, Damith  and
      Tan, Fiona Anting  and
      Uyangodage, Lasitha",
    booktitle = "Proceedings of the First Workshop on Language Models for Low-Resource Languages",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, United Arab Emirates",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.loreslm-1.11/",
    pages = "145--165",
    abstract = "Contemporary machine translation systems prioritize fluent, natural-sounding output with flexible word ordering. In contrast, interlinear translation maintains the source text`s syntactic structure by aligning target language words directly beneath their source counterparts. Despite its importance in classical scholarship, automated approaches to interlinear translation remain understudied. We evaluated neural interlinear translation from Ancient Greek to English and Polish using four transformer-based models: two Ancient Greek-specialized (GreTa and PhilTa) and two general-purpose multilingual models (mT5-base and mT5-large). Our approach introduces novel morphological embedding layers and evaluates text preprocessing and tag set selection across 144 experimental configurations using a word-aligned parallel corpus of the Greek New Testament. Results show that morphological features through dedicated embedding layers significantly enhance translation quality, improving BLEU scores by 35{\%} (44.67 {\textrightarrow} 60.40) for English and 38{\%} (42.92 {\textrightarrow} 59.33) for Polish compared to baseline models. PhilTa achieves state-of-the-art performance for English, while mT5-large does so for Polish. Notably, PhilTa maintains stable performance using only 10{\%} of training data. Our findings challenge the assumption that modern neural architectures cannot benefit from explicit morphological annotations. While preprocessing strategies and tag set selection show minimal impact, the substantial gains from morphological embeddings demonstrate their value in low-resource scenarios."
}

Downloads last month: 8

Dataset used to train mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh

Collection including mrapacz/interlinear-pl-mt5-large-emb-auto-diacritics-bh

LoResLM Interlinear Translation

Collection

https://github.com/mrapacz/loreslm-interlinear-translation • 145 items • Updated Feb 22, 2025