OpenAI Privacy Filter MLX 8-bit

This repository contains an 8-bit OpenMed MLX artifact for openai/privacy-filter, packaged for local PII detection on Apple Silicon with OpenMed.

OpenAI Privacy Filter is a bidirectional token-classification model for detecting personally identifiable information in text. This OpenMed MLX build keeps the original BIOES token-label head, uses the o200k_base tokenizer assets, and runs with OpenMed's Python and Swift MLX runtimes.

After the model is downloaded once, inference runs locally. No document text is sent to a server.

Model Details

  • Source checkpoint: openai/privacy-filter
  • OpenMed MLX family: openai-privacy-filter
  • Task: token classification for privacy span detection
  • Weight format: weights.safetensors
  • Quantization: 8-bit affine quantization, group size 64
  • Runtime: OpenMed + MLX on Apple Silicon
  • Tokenizer: o200k_base / tiktoken-style BPE
  • Labels: account_number, private_address, private_date, private_email, private_person, private_phone, private_url, secret

This artifact uses expert-aware MLX quantization: embeddings, attention projections, MoE gates, sparse-MoE expert tensors, and the token-classification head are all stored in 8-bit packed form. The resulting weights.safetensors file is about 1.39 GiB, compared with about 2.61 GiB for the BF16 OpenMed MLX artifact.

Quick Start: Python

pip install -U openmed "openmed[mlx]"
from huggingface_hub import snapshot_download
from openmed.mlx.inference import create_mlx_pipeline

model_path = snapshot_download("OpenMed/privacy-filter-mlx-8bit")
pipe = create_mlx_pipeline(model_path)

text = "My name is Alice Smith and my email is alice.smith@example.com."
entities = pipe(text)

for entity in entities:
    print(entity)

Example output:

{
    "entity_group": "private_person",
    "word": "Alice Smith",
    "start": 11,
    "end": 22,
    "score": 0.9999,
}
{
    "entity_group": "private_email",
    "word": "alice.smith@example.com",
    "start": 39,
    "end": 62,
    "score": 0.9998,
}

Quick Start: Swift and Apple Apps

Add OpenMedKit to your Xcode project:

  1. Open Xcode and choose File > Add Package Dependencies.
  2. Paste https://github.com/maziyarpanahi/openmed.
  3. Select the OpenMedKit package product.
  4. Download and cache the MLX model once, then run inference locally.
import OpenMedKit

let modelURL = try await OpenMedModelStore.downloadMLXModel(
    repoID: "OpenMed/privacy-filter-mlx-8bit"
)

let openmed = try OpenMed(backend: .mlx(modelDirectoryURL: modelURL))
let entities = try openmed.extractPII(
    "My name is Alice Smith and my email is alice.smith@example.com."
)

for entity in entities {
    print(entity.text, entity.label, entity.score)
}

For iOS, run on Apple Silicon hardware. The iOS Simulator is not the recommended acceptance target for MLX inference.

Validation

The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples. BF16 and Q8 returned identical grouped spans for person, date, phone, email, address, and account-number examples.

OpenMed also includes unit tests for:

  • q8 artifact loading
  • quantization metadata decoding
  • expert tensor packing and .scales coverage
  • finite logits from the q8 runtime
  • bf16/q8 shape and argmax-label coherence
  • BIOES/Viterbi span decoding

Intended Use

Use this model for local privacy filtering, PII detection, redaction workflows, and evaluation on Apple devices. For high-risk domains such as healthcare, legal, finance, education, and government, evaluate against your own data and policy requirements before production use.

Credits

Downloads last month
115
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenMed/privacy-filter-mlx-8bit

Finetuned
(15)
this model

Collection including OpenMed/privacy-filter-mlx-8bit