Text Classification
setfit
ONNX
attention-weights
context-compression
intent-classification
multilingual
Instructions to use naranor/SetFit-Multilingual-ONNX-Router-V1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- setfit
How to use naranor/SetFit-Multilingual-ONNX-Router-V1 with setfit:
from setfit import SetFitModel model = SetFitModel.from_pretrained("naranor/SetFit-Multilingual-ONNX-Router-V1") - Notebooks
- Google Colab
- Kaggle
| library_name: setfit | |
| license: mit | |
| base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | |
| tags: | |
| - setfit | |
| - onnx | |
| - attention-weights | |
| - context-compression | |
| - intent-classification | |
| - multilingual | |
| pipeline_tag: text-classification | |
| # SetFit Multilingual OVR Router (ONNX with Attentions) | |
| This is a State-of-the-Art **SetFit** model exported to **ONNX** format, specifically trained to classify LLM tasks into three semantic categories: **Needle** (Fact Retrieval), **Reasoning** (Logic/Analysis), and **Summary** (General Recap). | |
| The model is based on [paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) and has been modified to expose **all 12 layers of raw attention weights**. | |
| ## Key Features | |
| - **3-Class Classification:** High-precision separation of intents. | |
| - **Multilingual:** Native support for Russian, English, and 50+ other languages. | |
| - **Attention Output:** Every inference returns a full attention matrix `(batch, heads, seq_len, seq_len)` for all 12 layers. | |
| - **Dual Precision:** Both **FP32** (`model.onnx`) and **INT8 Quantized** (`model_quantized.onnx`) versions are available. | |
| - **Optimized for CPU:** Fast ONNX inference via `onnxruntime`. | |
| ## Classification Map | |
| - **Label 0:** Summary (Chatter, Recaps, TL;DR) | |
| - **Label 1:** Needle (Pinpoint facts, parameters, keys, IPs) | |
| - **Label 2:** Reasoning (Comparison, analysis, code debugging, logical chains) | |
| ## Project Origin | |
| This model is a core component of the **[WAMP-proxy](https://github.com/naranor/wamp-proxy)** project, an intelligent middleware for research into LLM context optimization. | |
| ## Quick Inference (Python) | |
| ```python | |
| import numpy as np | |
| import onnxruntime as ort | |
| from transformers import AutoTokenizer | |
| import json | |
| # 1. Load model and weights | |
| session = ort.InferenceSession("model.onnx") | |
| tokenizer = AutoTokenizer.from_pretrained(".") | |
| with open("router_weights_setfit.json", "r") as f: | |
| weights = json.load(f) | |
| # 2. Prepare Input | |
| text = "What is the database port?" | |
| inputs = tokenizer(text, return_tensors="np") | |
| onnx_inputs = { | |
| "input_ids": inputs["input_ids"].astype(np.int64), | |
| "attention_mask": inputs["attention_mask"].astype(np.int64) | |
| } | |
| # 3. Run | |
| outputs = session.run(None, onnx_inputs) | |
| embeddings = np.mean(outputs[0], axis=1) # Mean pooling | |
| # 4. Predict probabilities (LogReg Head) | |
| scores = np.dot(embeddings, np.array(weights["coef"]).T) + weights["intercept"] | |
| probs = np.exp(scores) / np.exp(scores).sum() | |
| print(f"Probabilities: {probs}") | |
| ``` | |
| ## License | |
| MIT License. | |