WECHSEL-XLM-R-Dense β EViRAL v6
Cross-lingual dense retrieval model: Ede (Rhade) query β Vietnamese passage.
How to load for continued fine-tuning
from huggingface_hub import hf_hub_download
import torch, json, numpy as np
vocab = json.load(open(hf_hub_download('NIRVLab/ede-xlm-roberta-base', 'vocab.json')))
tok_cfg = json.load(open(hf_hub_download('NIRVLab/ede-xlm-roberta-base', 'tokenizer_config.json')))
wechsel_np = np.load(hf_hub_download('NIRVLab/ede-xlm-roberta-base', 'wechsel_embeddings.npy'))
state_dict = torch.load(hf_hub_download('NIRVLab/ede-xlm-roberta-base', 'align.pt'), map_location='cpu')
# Rebuild encoder (same code as notebook)
encoder = make_encoder(wechsel_np) # uses vocab, VOCAB_SIZE, etc. from notebook
encoder.load_state_dict(state_dict)
Training details
- Backbone:
xlm-roberta-base - WECHSEL k=10, Ο=0.1
- Bilingual dict:
NIRVLab/rhade-vietnamese-mt - Pipeline: MLM (3 epochs) β cross-lingual alignment (2 epochs)
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support