Instructions to use SemplificaAI/gliner2-privacy-filter-PII-multi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER2
How to use SemplificaAI/gliner2-privacy-filter-PII-multi with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("SemplificaAI/gliner2-privacy-filter-PII-multi") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - GLiNER
How to use SemplificaAI/gliner2-privacy-filter-PII-multi with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("SemplificaAI/gliner2-privacy-filter-PII-multi") - Notebooks
- Google Colab
- Kaggle
- GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding)
GLiNER2 Privacy-Filter PII Multi (ONNX Fragmented & IOBinding)
This repository contains the ONNX-exported weights of fastino/gliner2-privacy-filter-PII-multi,
the multilingual PII detection model built on GLiNER2 by Fastino AI.
The model is exported in a fragmented format (encoder, token_gather, span_rep, schema_gather, count_pred_argmax, count_lstm_fixed, scorer, classifier) for direct compatibility with gliner2-rs, the official Zero-Python Native Rust inference engine for GLiNER2.
It supports detection of 42 PII entity types across 7 languages (EN, FR, ES, DE, IT, PT, NL).
🆕 V2 Zero-Copy IOBinding Models
Like the gliner2-multi-v1-onnx base release, this repo ships the V2 fused IOBinding variant. Gather, ArgMax, MatMul operations are fused directly into the ONNX graphs so that tensors never leave the GPU/NPU VRAM, bypassing the PCIe bus and cutting inference latency by ~30 % on discrete GPUs.
📂 Available Variants
| Variant | Use case | Notes |
|---|---|---|
fp16_v2 (recommended) |
NVIDIA CUDA · AMD ROCm · Apple CoreML · Qualcomm QNN | Zero-Copy VRAM (IOBinding), full FP16 IO, fused ops |
fp32_v2 |
CPU (AVX2 / XNNPACK / ARM NEON) | High precision V2 fusions for CPU |
fp16 (standard) |
Legacy compatible, all EPs | FP32 IO (CoreML-compatible), slower on CUDA due to PCIe round-trips |
fp32 (standard) |
Universal fallback | Legacy Float32 |
Each variant ships 8 fragments:
encoder_{precision}.onnx ~530–1060 MB
token_gather_{precision}.onnx ~ <1 MB
span_rep_{precision}.onnx ~32–63 MB
schema_gather_{precision}.onnx ~ <1 MB
count_pred_argmax_{precision}.onnx ~2–5 MB
count_lstm_fixed_{precision}.onnx ~20–41 MB
scorer_{precision}.onnx ~ <1 MB
classifier_{precision}.onnx ~2–5 MB
Total: ~590 MB (FP16) or ~1.17 GB (FP32) per variant.
🎯 Supported PII Labels (42 types)
Person / Names (6 labels)
person, full_name, first_name, middle_name, last_name, date_of_birth
Contact / Address (8 labels)
email, phone_number, address, street_address, city, state_or_region, postal_code, country
Government / Tax IDs (7 labels)
government_id, national_id_number, passport_number, drivers_license_number, license_number, tax_id, tax_number
Banking / Payment (8 labels)
bank_account, account_number, routing_number, iban, payment_card, card_number, card_expiry, card_cvv
Digital Identity (4 labels)
username, ip_address, account_id, sensitive_account_id
Secrets / Credentials (5 labels)
password, secret, api_key, access_token, recovery_code
Sensitive Dates (4 labels)
sensitive_date, document_date, expiration_date, transaction_date
🚀 Usage in Rust (gliner2-rs)
use gliner2_inference::{Gliner2Engine, ModelType, SchemaTask};
// Auto-downloads the V2 FP16 fragments from this HuggingFace repo
// and switches to the high-performance IOBinding engine.
let engine = Gliner2Engine::from_pretrained(
"SemplificaAI/gliner2-privacy-filter-PII-multi",
Some("fp16_v2"),
ModelType::HuggingFace,
)?;
let text = "Please contact Maria Jensen at maria.jensen@example.dk or +45 20 12 34 56.";
let tasks = vec![
SchemaTask::Entities(vec![
"person".into(), "email".into(), "phone_number".into(),
])
];
let (entities, _, _) = engine.extract(text, &tasks)?;
Requires gliner2-rs >= 0.4.1 for automatic V2 detection / IOBinding routing.
🐍 Usage in Python (onnxruntime)
Run the 8-fragment pipeline manually (no Python gliner2 dependency needed):
import onnxruntime as ort
# Per fragment (example for the encoder, CUDA backend)
encoder = ort.InferenceSession(
"encoder_fp16_iobinding.onnx",
providers=["CUDAExecutionProvider"],
)
# ...load the other 7 fragments analogously...
# Chain them via IOBinding (see validate_onnx_v2.py for a full reference impl)
For a simpler entry point you can keep using the original PyTorch model via the gliner2 Python package on fastino/gliner2-privacy-filter-PII-multi; this ONNX repo is optimised for production deployment without Python.
🛠 Pipeline Wiring (IOBinding chain)
encoder_fp16_iobinding.onnx
│
├─ token_gather_fp16_iobinding.onnx
│ └─ span_rep_fp16_iobinding.onnx
│
└─ schema_gather_fp16_iobinding.onnx
├─ count_pred_argmax_fp16_iobinding.onnx → pred_count (int64)
└─ count_lstm_fixed_fp16_iobinding.onnx
└─ scorer_fp16_iobinding.onnx → entity_scores
classifier_fp16_iobinding.onnx (only for classification tasks)
⚙️ Technical Notes
- opset 17 (ONNX 1.14+) for maximum execution-provider compatibility.
count_lstm_fixedexports the GRU unrolled to 20 fixed steps at tracing time → compatible with execution providers that don't support dynamic loops (Apple CoreML, Qualcomm QNN).scoreruses fused Reshape + MatMul + Transpose instead ofEinsumfor compatibility with QNN/CoreML FP16.- INT8 not supported: the DeBERTa-v3 disentangled-attention activations contain extreme outliers that saturate 8-bit ranges (the same limitation called out by the GLiNER2 maintainers). FP16 remains the optimal compression target.
- Encoder size: ~1.06 GB FP32 → ~530 MB FP16. Larger than the multi-v1 base because of the wider classification head (42 PII labels) and per-language fine-tuning.
🪪 License
Apache 2.0 — same as the upstream model.
🙏 Acknowledgements
- Upstream model:
fastino/gliner2-privacy-filter-PII-multiby Fastino AI. - GLiNER2 paper: Zaratiana et al., GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction, EMNLP 2025.
- ONNX fragmentation + IOBinding strategy: Semplifica s.r.l., as used in
gliner2-multi-v1-onnx.
📚 Citation
@misc{fastino2026gliner2pii,
title = {GLiNER2-PII: Multilingual PII Extraction via Synthetic Fine-Tuning},
author = {{Fastino AI Team}},
year = {2026},
url = {https://huggingface.co/fastino/gliner2-privacy-filter-PII-multi}
}
@inproceedings{zaratiana-etal-2025-gliner2,
title = {GLiNER2: Schema-Driven Multi-Task Learning for Structured Information Extraction},
author = {Zaratiana, Urchade and Pasternak, Gil and Boyd, Oliver and Hurn-Maloney, George and Lewis, Ash},
booktitle = {Proceedings of EMNLP 2025: System Demonstrations},
year = {2025}
}
Model tree for SemplificaAI/gliner2-privacy-filter-PII-multi
Base model
fastino/gliner2-privacy-filter-PII-multi