You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ONNX Imputer imputed_value_floats Preprocessing Authority PoC

Summary

An ONNX model file contains a runtime-consumed ML operator attribute (ai.onnx.ml.Imputer.imputed_value_floats) that controls the replacement value applied to sentinel numeric inputs before downstream scoring. A crafted ONNX model can mutate imputed_value_floats while keeping the downstream LinearClassifier coefficients and intercepts byte-identical, causing the same numeric input to produce a different post-imputer tensor, different scores, and a flipped prediction class.

This is not output label substitution. This is not classlabels_strings. This is not OneHotEncoder.cats_strings. This is pre-score sentinel numeric value replacement authority: the mutation occurs before any numeric computation, causing the downstream classifier to score a different value against unchanged weights.

Affected Product

Format: ONNX model file (.onnx)
Root operator: ai.onnx.ml.Imputer
Root attribute: imputed_value_floats
Supporting attribute: replaced_value_float (identical in clean and mutant — only imputed_value_floats is mutated)
Downstream consumer: ai.onnx.ml.LinearClassifier (coefficients and intercepts unchanged)
Runtime: onnxruntime Python package

Vulnerability Details

ai.onnx.ml.Imputer.imputed_value_floats is a runtime-consumed attribute that determines the replacement float value for each feature dimension when the input matches replaced_value_float. When a victim loads and runs an ONNX model:

import onnxruntime as ort, numpy as np
sess = ort.InferenceSession("model.onnx")
label, scores = sess.run(None, {"input": np.array([[0.0]], dtype=np.float32)})

Imputer reads imputed_value_floats at runtime and replaces any input value equal to replaced_value_float:

post_imputer[i] = imputed_value_floats[i]  if  input[i] == replaced_value_float  else  input[i]

An attacker who mutates only imputed_value_floats from [-5.0] to [+5.0] while keeping replaced_value_float=0.0 unchanged causes the same input [[0.0]] to produce [[-5.0]] (clean) vs [[+5.0]] (mutant) as post-imputer tensors. The downstream LinearClassifier then applies its unchanged coefficients [-1.0, 1.0] to different values, producing different scores and a flipped prediction — while coefficients and intercepts remain byte-identical.

Impact

Prediction manipulation: Model prediction flips (label 0 → label 1) for the same sentinel numeric input while all classifier weights are unchanged.
Weights unchanged: LinearClassifier.coefficients and intercepts are byte-identical in clean and mutant models. The victim cannot detect manipulation by inspecting weight values.
No error generated: onnxruntime.InferenceSession loads the mutated imputed_value_floats silently with no warning.
Scope: Any ONNX model using ai.onnx.ml.Imputer for numeric preprocessing followed by a numeric classifier where the attacker can supply or modify the distributed .onnx file (malicious model hub upload, compromised model registry, supply chain substitution).

Proof of Concept

Package structure:

clean.onnx:
  Imputer(replaced_value_float=0.0, imputed_value_floats=[-5.0])
  → LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0])

mutant.onnx:
  Imputer(replaced_value_float=0.0, imputed_value_floats=[+5.0])  ← ONLY CHANGE
  → LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0])  ← BYTE-IDENTICAL

Run:

pip install onnx onnxruntime
python reproduce_onnx_imputer_preprocessing_flip.py

Expected final line:

ONNX_IMPUTER_PREPROCESSING_FLIP_CONFIRMED

Runtime Evidence

Metric	Value
`clean` `replaced_value_float`	`0.0`
`mutant` `replaced_value_float`	`0.0` (identical)
`clean` `imputed_value_floats`	`[-5.0]`
`mutant` `imputed_value_floats`	`[+5.0]`
Input	`[[0.0]]` (triggers sentinel replacement)
`clean` post-imputer tensor	`[[-5.0]]`
`mutant` post-imputer tensor	`[[+5.0]]`
`coefficients` clean	`[-1.0, 1.0]`
`coefficients` mutant	`[-1.0, 1.0]` (identical)
`intercepts` clean	`[0.0, 0.0]`
`intercepts` mutant	`[0.0, 0.0]` (identical)
`clean` label	0
`mutant` label	1
`clean` scores	`[5.0, -5.0]`
`mutant` scores	`[-5.0, 5.0]`
Prediction flip	0 → 1 — zero coefficient change
clean SHA256	`835687dbc987082b...`
mutant SHA256	`92aa6e45839f19f4...`
Reproducibility	5/5

Distinctness

Prior Finding	Root	Distinct	Reason
ONNX `OneHotEncoder.cats_strings`	Pre-score categorical feature-column binding	✅	`cats_strings` maps a categorical string input to a one-hot column position. `imputed_value_floats` replaces a sentinel float with a different float value. Different operator, different input type (categorical vs numeric), different mechanism. No overlap.
ONNX `SVMClassifier.classlabels_strings`	Post-inference label rendering	✅	`classlabels_strings` remaps the integer argmax result to a label string AFTER numeric computation completes. `imputed_value_floats` operates BEFORE any scoring. Different operator, different stage.
Joblib `CountVectorizer.vocabulary_`	Joblib NLP feature-column binding	✅	Different format (pkl vs .onnx), different runtime (sklearn vs ort), different operator class (NLP text vectorizer vs numeric imputer).
SafeTensors `tokenizer.json model.vocab`	HF Transformers NLP tokenization	✅	Different format (sidecar JSON vs ONNX internal attribute), different modality (NLP vs numeric float preprocessing).
SafeTensors `preprocessor_config.json`	HF Transformers image normalization	✅	Different format, different modality (CV float normalization vs ONNX-internal sentinel float replacement).
TFLite `NormalizationOptions`	FlatBuffer binary preprocessing metadata	✅	Different format, different runtime, different spec layer (FlatBuffer metadata vs ONNX operator attribute).

Non-Claims

The following are not claimed:

This is not a .onnx binary parser vulnerability
This is not an RCE / ACE / arbitrary code execution finding
Scanner bypass is not the primary impact
This is not classlabels_strings or any output label rendering mechanism
This is not OneHotEncoder.cats_strings or any categorical feature-column binding mechanism
This does not claim that no model file content changed; imputed_value_floats is a runtime-consumed operator attribute within the .onnx model file and is the intentionally mutated component
This does not claim NaN-only behavior; the PoC uses a sentinel value of 0.0

Recommendation

Preprocessing attribute integrity manifest: Bind safety-critical ML preprocessing operator attributes (imputed_value_floats, replaced_value_float, scale, offset, etc.) to a model-level integrity manifest at save time and verify at load time. imputed_value_floats controls the numeric value fed to downstream classifiers and must be treated as security-relevant model state.
Training-time attribute fingerprint: Store and verify a fingerprint of the expected preprocessing attributes as part of the model provenance record. Structural validation of opset and graph shape is insufficient because an attacker can change imputed_value_floats while preserving graph structure and downstream weight values.
Documentation and warnings: Clearly document that ai.onnx.ml.Imputer.imputed_value_floats determines which numeric value is fed to downstream numeric classifiers for sentinel inputs. Loading tools should warn when preprocessing attributes differ from the trusted model manifest while downstream numeric weights remain unchanged.

Files

File	Description
`clean.onnx`	Clean model with `imputed_value_floats=[-5.0]`
`mutant.onnx`	Mutant model with `imputed_value_floats=[+5.0]` (only change)
`reproduce_onnx_imputer_preprocessing_flip.py`	Full reproduction script
`inspect_onnx_imputer_hash_matrix.py`	Hash matrix and attribute isolation inspector
`evidence_runtime_results.json`	Runtime evidence (labels, scores, hashes, assertions)
`evidence_hash_matrix.json`	Attribute-level diff matrix
`evidence_distinctness_matrix.json`	Distinctness analysis vs 6 prior findings
`evidence_route_framing.json`	Route and impact framing
`evidence_top_axis.json`	Top attack axis and key invariants
`SHA256SUMS.txt`	SHA256 checksums for all files
`requirements.txt`	Python dependencies

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support