You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

ONNX Imputer imputed_value_floats Preprocessing Authority PoC

Summary

An ONNX model file contains a runtime-consumed ML operator attribute (ai.onnx.ml.Imputer.imputed_value_floats) that controls the replacement value applied to sentinel numeric inputs before downstream scoring. A crafted ONNX model can mutate imputed_value_floats while keeping the downstream LinearClassifier coefficients and intercepts byte-identical, causing the same numeric input to produce a different post-imputer tensor, different scores, and a flipped prediction class.

This is not output label substitution. This is not classlabels_strings. This is not OneHotEncoder.cats_strings. This is pre-score sentinel numeric value replacement authority: the mutation occurs before any numeric computation, causing the downstream classifier to score a different value against unchanged weights.


Affected Product

  • Format: ONNX model file (.onnx)
  • Root operator: ai.onnx.ml.Imputer
  • Root attribute: imputed_value_floats
  • Supporting attribute: replaced_value_float (identical in clean and mutant β€” only imputed_value_floats is mutated)
  • Downstream consumer: ai.onnx.ml.LinearClassifier (coefficients and intercepts unchanged)
  • Runtime: onnxruntime Python package

Vulnerability Details

ai.onnx.ml.Imputer.imputed_value_floats is a runtime-consumed attribute that determines the replacement float value for each feature dimension when the input matches replaced_value_float. When a victim loads and runs an ONNX model:

import onnxruntime as ort, numpy as np
sess = ort.InferenceSession("model.onnx")
label, scores = sess.run(None, {"input": np.array([[0.0]], dtype=np.float32)})

Imputer reads imputed_value_floats at runtime and replaces any input value equal to replaced_value_float:

post_imputer[i] = imputed_value_floats[i]  if  input[i] == replaced_value_float  else  input[i]

An attacker who mutates only imputed_value_floats from [-5.0] to [+5.0] while keeping replaced_value_float=0.0 unchanged causes the same input [[0.0]] to produce [[-5.0]] (clean) vs [[+5.0]] (mutant) as post-imputer tensors. The downstream LinearClassifier then applies its unchanged coefficients [-1.0, 1.0] to different values, producing different scores and a flipped prediction β€” while coefficients and intercepts remain byte-identical.


Impact

  • Prediction manipulation: Model prediction flips (label 0 β†’ label 1) for the same sentinel numeric input while all classifier weights are unchanged.
  • Weights unchanged: LinearClassifier.coefficients and intercepts are byte-identical in clean and mutant models. The victim cannot detect manipulation by inspecting weight values.
  • No error generated: onnxruntime.InferenceSession loads the mutated imputed_value_floats silently with no warning.
  • Scope: Any ONNX model using ai.onnx.ml.Imputer for numeric preprocessing followed by a numeric classifier where the attacker can supply or modify the distributed .onnx file (malicious model hub upload, compromised model registry, supply chain substitution).

Proof of Concept

Package structure:

clean.onnx:
  Imputer(replaced_value_float=0.0, imputed_value_floats=[-5.0])
  β†’ LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0])

mutant.onnx:
  Imputer(replaced_value_float=0.0, imputed_value_floats=[+5.0])  ← ONLY CHANGE
  β†’ LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0])  ← BYTE-IDENTICAL

Run:

pip install onnx onnxruntime
python reproduce_onnx_imputer_preprocessing_flip.py

Expected final line:

ONNX_IMPUTER_PREPROCESSING_FLIP_CONFIRMED

Runtime Evidence

Metric Value
clean replaced_value_float 0.0
mutant replaced_value_float 0.0 (identical)
clean imputed_value_floats [-5.0]
mutant imputed_value_floats [+5.0]
Input [[0.0]] (triggers sentinel replacement)
clean post-imputer tensor [[-5.0]]
mutant post-imputer tensor [[+5.0]]
coefficients clean [-1.0, 1.0]
coefficients mutant [-1.0, 1.0] (identical)
intercepts clean [0.0, 0.0]
intercepts mutant [0.0, 0.0] (identical)
clean label 0
mutant label 1
clean scores [5.0, -5.0]
mutant scores [-5.0, 5.0]
Prediction flip 0 β†’ 1 β€” zero coefficient change
clean SHA256 835687dbc987082b...
mutant SHA256 92aa6e45839f19f4...
Reproducibility 5/5

Distinctness

Prior Finding Root Distinct Reason
ONNX OneHotEncoder.cats_strings Pre-score categorical feature-column binding βœ… cats_strings maps a categorical string input to a one-hot column position. imputed_value_floats replaces a sentinel float with a different float value. Different operator, different input type (categorical vs numeric), different mechanism. No overlap.
ONNX SVMClassifier.classlabels_strings Post-inference label rendering βœ… classlabels_strings remaps the integer argmax result to a label string AFTER numeric computation completes. imputed_value_floats operates BEFORE any scoring. Different operator, different stage.
Joblib CountVectorizer.vocabulary_ Joblib NLP feature-column binding βœ… Different format (pkl vs .onnx), different runtime (sklearn vs ort), different operator class (NLP text vectorizer vs numeric imputer).
SafeTensors tokenizer.json model.vocab HF Transformers NLP tokenization βœ… Different format (sidecar JSON vs ONNX internal attribute), different modality (NLP vs numeric float preprocessing).
SafeTensors preprocessor_config.json HF Transformers image normalization βœ… Different format, different modality (CV float normalization vs ONNX-internal sentinel float replacement).
TFLite NormalizationOptions FlatBuffer binary preprocessing metadata βœ… Different format, different runtime, different spec layer (FlatBuffer metadata vs ONNX operator attribute).

Non-Claims

The following are not claimed:

  • This is not a .onnx binary parser vulnerability
  • This is not an RCE / ACE / arbitrary code execution finding
  • Scanner bypass is not the primary impact
  • This is not classlabels_strings or any output label rendering mechanism
  • This is not OneHotEncoder.cats_strings or any categorical feature-column binding mechanism
  • This does not claim that no model file content changed; imputed_value_floats is a runtime-consumed operator attribute within the .onnx model file and is the intentionally mutated component
  • This does not claim NaN-only behavior; the PoC uses a sentinel value of 0.0

Recommendation

  1. Preprocessing attribute integrity manifest: Bind safety-critical ML preprocessing operator attributes (imputed_value_floats, replaced_value_float, scale, offset, etc.) to a model-level integrity manifest at save time and verify at load time. imputed_value_floats controls the numeric value fed to downstream classifiers and must be treated as security-relevant model state.
  2. Training-time attribute fingerprint: Store and verify a fingerprint of the expected preprocessing attributes as part of the model provenance record. Structural validation of opset and graph shape is insufficient because an attacker can change imputed_value_floats while preserving graph structure and downstream weight values.
  3. Documentation and warnings: Clearly document that ai.onnx.ml.Imputer.imputed_value_floats determines which numeric value is fed to downstream numeric classifiers for sentinel inputs. Loading tools should warn when preprocessing attributes differ from the trusted model manifest while downstream numeric weights remain unchanged.

Files

File Description
clean.onnx Clean model with imputed_value_floats=[-5.0]
mutant.onnx Mutant model with imputed_value_floats=[+5.0] (only change)
reproduce_onnx_imputer_preprocessing_flip.py Full reproduction script
inspect_onnx_imputer_hash_matrix.py Hash matrix and attribute isolation inspector
evidence_runtime_results.json Runtime evidence (labels, scores, hashes, assertions)
evidence_hash_matrix.json Attribute-level diff matrix
evidence_distinctness_matrix.json Distinctness analysis vs 6 prior findings
evidence_route_framing.json Route and impact framing
evidence_top_axis.json Top attack axis and key invariants
SHA256SUMS.txt SHA256 checksums for all files
requirements.txt Python dependencies
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support