| --- |
| language: |
| - multilingual |
| - ko |
| - en |
| license: apache-2.0 |
| tags: |
| - sentence-transformers |
| - feature-extraction |
| - sentence-similarity |
| - onnx |
| - quantized |
| - xlm-roberta |
| - dense-encoder |
| - dense |
| - fastembed |
| base_model: telepix/PIXIE-Rune-v1.0 |
| pipeline_tag: feature-extraction |
| --- |
| |
| # PIXIE-Rune-v1.0 β ONNX Quantized Variants |
|
|
| ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0), |
| an encoder-based multilingual embedding model developed by TelePIX Co., Ltd. optimized for semantic |
| retrieval across 74 languages with specialization in Korean/English aerospace domain applications. |
|
|
| > **Original model:** [`telepix/PIXIE-Rune-v1.0`](https://huggingface.co/telepix/PIXIE-Rune-v1.0) β |
| > safetensors weights + FP32 ONNX (`onnx/model.onnx` + `onnx/model.onnx_data`). |
| > This repo adds INT8 and INT4 quantized ONNX variants for CPU-efficient deployment. |
| |
| --- |
| |
| ## Model Description |
| |
| | Property | Value | |
| |---|---| |
| | Base model | `telepix/PIXIE-Rune-v1.0` (XLM-RoBERTa-large) | |
| | Architecture | Transformer encoder | |
| | Output dimensionality | 1024 | |
| | Pooling | Mean pooling + L2 normalize | |
| | Max sequence length | 6,000 tokens | |
| | Languages | 74 (XLM-RoBERTa vocabulary: 250,002 tokens) | |
| | Domain | General multilingual + aerospace specialization | |
| | License | Apache 2.0 | |
| |
| --- |
| |
| ## ONNX Variants |
| |
| | File | Quantization | Size | Avg cos vs FP32 | Pearson r | MRR | Notes | |
| |---|---|---|---|---|---|---| |
| | `onnx/model_quantized.onnx` | INT8 dynamic | 542 MB | 0.969 | 0.998 | 1.00 | `quantize_dynamic`, all weights | |
| | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather | |
| | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) | |
|
|
| **Metrics** measured on 8 semantically diverse sentences vs FP32 reference. |
| Pearson r = correlation of pairwise cosine similarity matrices (structure preservation). |
| MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved. |
|
|
| ### Quantization methodology |
|
|
| The XLM-RoBERTa vocabulary has 250,002 tokens Γ 1024 dimensions, making the word embedding |
| table the dominant weight (~977 MB FP32). Each variant handles it differently: |
|
|
| - **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` β |
| quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility. |
| - **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass. |
| Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights |
| to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings |
| the embedding table from 977 MB FP32 β 244 MB INT8. |
| - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual |
| `DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric |
| INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β21. Embedding: 977 MB β 122 MB. |
|
|
| --- |
|
|
| ## Usage |
|
|
| ### fastembed (Rust) |
|
|
| This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs): |
|
|
| ```rust |
| use fastembed::{EmbeddingModel, InitOptions, TextEmbedding}; |
| |
| // INT8 β most compatible, 542 MB |
| let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?; |
| |
| // INT4 + INT8 embedding β 434 MB |
| let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?; |
| |
| // INT4 full β smallest, 337 MB |
| let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?; |
| |
| let embeddings = model.embed(vec!["μλ
νμΈμ", "Hello world"], None)?; |
| ``` |
|
|
| ### ONNX Runtime (Python) |
|
|
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| from tokenizers import Tokenizer |
| |
| tokenizer = Tokenizer.from_file("tokenizer.json") |
| tokenizer.enable_truncation(max_length=512) |
| tokenizer.enable_padding(pad_token="<pad>", pad_id=1) |
| |
| session = ort.InferenceSession("onnx/model_quantized.onnx", |
| providers=["CPUExecutionProvider"]) |
| |
| texts = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?", |
| "ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€."] |
| |
| enc = tokenizer.encode_batch(texts) |
| ids = np.array([e.ids for e in enc], dtype=np.int64) |
| mask = np.array([e.attention_mask for e in enc], dtype=np.int64) |
| |
| out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # (batch, seq, 1024) |
| |
| # Mean pooling + L2 normalize |
| pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9) |
| norms = np.linalg.norm(pooled, axis=-1, keepdims=True) |
| embeddings = pooled / norms.clip(1e-12) |
| # cosine similarity |
| scores = embeddings @ embeddings.T |
| print(scores) |
| ``` |
|
|
| ### sentence-transformers (original FP32 weights) |
|
|
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("telepix/PIXIE-Rune-v1.0") |
| |
| queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?", |
| "κ΅λ°© λΆμΌμ μ΄λ€ μμ± μλΉμ€κ° μ 곡λλμ?"] |
| documents = ["ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€.", |
| "μ μ°° λ° κ°μ λͺ©μ μ μμ± μμμ ν΅ν΄ κ΅λ°© κ΄λ ¨ μ λ° λΆμ μλΉμ€λ₯Ό μ 곡ν©λλ€."] |
| |
| q_emb = model.encode(queries, prompt_name="query") |
| d_emb = model.encode(documents) |
| scores = model.similarity(q_emb, d_emb) |
| print(scores) |
| ``` |
|
|
| --- |
|
|
| ## Quality Benchmarks (original model) |
|
|
| Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0), |
| evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation). |
|
|
| ### 6 Datasets of MTEB (Korean) |
|
|
| | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 | |
| |---|---|---|---|---|---|---| |
| | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 | |
| | telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 | |
| | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** | |
| | telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 | |
| | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 | |
| | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 | |
| | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 | |
| | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 | |
| | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 | |
| | openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 | |
|
|
| Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval. |
|
|
| ### 7 Datasets of BEIR (English) |
|
|
| | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 | |
| |---|---|---|---|---|---|---| |
| | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 | |
| | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** | |
| | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 | |
| | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 | |
| | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 | |
| | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 | |
| | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 | |
|
|
| Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS. |
|
|
| --- |
|
|
| ## License |
|
|
| Apache 2.0 β same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @software{TelePIX-PIXIE-Rune-v1, |
| title = {PIXIE-Rune-v1.0}, |
| author = {TelePIX AI Research Team and Bongmin Kim}, |
| year = {2025}, |
| url = {https://huggingface.co/telepix/PIXIE-Rune-v1.0} |
| } |
| ``` |
|
|
| ## Contact |
|
|
| Original model authors: bmkim@telepix.net |
| ONNX quantization: [cstr](https://huggingface.co/cstr) β open an issue on this repo for questions. |
|
|