| --- |
| base_model: intfloat/multilingual-e5-large-instruct |
| base_model_relation: quantized |
| library_name: transformers.js |
| pipeline_tag: feature-extraction |
| tags: |
| - transformers.js |
| - sentence-transformers |
| - onnx |
| - feature-extraction |
| - sentence-similarity |
| - mteb |
| - xlm-roberta |
| - e5 |
| - multilingual |
| language: |
| - multilingual |
| license: mit |
| --- |
| |
| # multilingual-e5-large-instruct (ONNX) |
|
|
| ONNX export of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) |
| with fp16 and int8 quantized variants. |
|
|
| Compatible with both [`@huggingface/transformers`](https://huggingface.co/docs/transformers.js) (JavaScript) and |
| [`sentence-transformers`](https://www.sbert.net/) (Python). |
|
|
| ## Available Models |
|
|
| | File | Format | Size | Description | |
| |------|--------|------|-------------| |
| | `onnx/model.onnx` + `model.onnx_data` | fp32 | 2.1 GB | Full precision, external data format | |
| | `onnx/model_fp16.onnx` | fp16 | 1.0 GB | Half precision, negligible quality loss | |
| | `onnx/model_quantized.onnx` | int8 | 535 MB | Dynamic quantization, smallest size | |
|
|
| ## Usage with Transformers.js |
|
|
| ```javascript |
| import { pipeline } from "@huggingface/transformers"; |
| |
| const extractor = await pipeline( |
| "feature-extraction", |
| "lmo3/multilingual-e5-large-instruct", |
| { dtype: "fp16" } // or "q8" for int8, omit for fp32 |
| ); |
| |
| // Queries use the instruct format |
| const query = "Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"; |
| const queryEmbedding = await extractor(query, { pooling: "mean", normalize: true }); |
| |
| // Documents are embedded as-is (no prefix) |
| const docEmbedding = await extractor("It is sunny outside", { pooling: "mean", normalize: true }); |
| ``` |
|
|
| ## Usage with sentence-transformers (Python) |
|
|
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("lmo3/multilingual-e5-large-instruct") |
| |
| # Queries use the instruct format |
| queries = ["Instruct: Retrieve semantically similar text.\nQuery: How is the weather today?"] |
| docs = ["It is sunny outside"] |
| |
| query_embeddings = model.encode(queries) |
| doc_embeddings = model.encode(docs) |
| ``` |
|
|
| ## Key Differences from Base E5 |
|
|
| This is the **instruct** variant of multilingual-e5-large. The key difference: |
|
|
| - **Queries** must be prefixed with `Instruct: <task description>\nQuery: ` |
| - **Documents** are embedded as-is, with no prefix |
|
|
| The instruction tells the model what retrieval task you're performing, improving embedding quality. |
| See the [original model card](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for task-specific instructions and benchmark results. |
|
|
| ## Export Details |
|
|
| - Exported via [Optimum](https://huggingface.co/docs/optimum) with ONNX opset 18 |
| - fp16 quantized via `onnxruntime.transformers.optimizer` |
| - int8 quantized via `onnxruntime.quantization.quantize_dynamic` |
| - `config.json` patched with `transformers.js_config` for automatic external data handling |
|
|
| ## Original Model |
|
|
| This is a conversion of [intfloat/multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct): |
|
|
| - **Architecture**: XLM-RoBERTa Large (24 layers, 1024 hidden, 16 heads) |
| - **Embedding dimension**: 1024 |
| - **Languages**: 100+ languages |
| - **License**: MIT |
|
|