YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
ONNX Imputer imputed_value_floats Preprocessing Authority PoC
Summary
An ONNX model file contains a runtime-consumed ML operator attribute (ai.onnx.ml.Imputer.imputed_value_floats) that controls the replacement value applied to sentinel numeric inputs before downstream scoring. A crafted ONNX model can mutate imputed_value_floats while keeping the downstream LinearClassifier coefficients and intercepts byte-identical, causing the same numeric input to produce a different post-imputer tensor, different scores, and a flipped prediction class.
This is not output label substitution. This is not classlabels_strings. This is not OneHotEncoder.cats_strings. This is pre-score sentinel numeric value replacement authority: the mutation occurs before any numeric computation, causing the downstream classifier to score a different value against unchanged weights.
Affected Product
- Format: ONNX model file (
.onnx) - Root operator:
ai.onnx.ml.Imputer - Root attribute:
imputed_value_floats - Supporting attribute:
replaced_value_float(identical in clean and mutant β onlyimputed_value_floatsis mutated) - Downstream consumer:
ai.onnx.ml.LinearClassifier(coefficients and intercepts unchanged) - Runtime:
onnxruntimePython package
Vulnerability Details
ai.onnx.ml.Imputer.imputed_value_floats is a runtime-consumed attribute that determines the replacement float value for each feature dimension when the input matches replaced_value_float. When a victim loads and runs an ONNX model:
import onnxruntime as ort, numpy as np
sess = ort.InferenceSession("model.onnx")
label, scores = sess.run(None, {"input": np.array([[0.0]], dtype=np.float32)})
Imputer reads imputed_value_floats at runtime and replaces any input value equal to replaced_value_float:
post_imputer[i] = imputed_value_floats[i] if input[i] == replaced_value_float else input[i]
An attacker who mutates only imputed_value_floats from [-5.0] to [+5.0] while keeping replaced_value_float=0.0 unchanged causes the same input [[0.0]] to produce [[-5.0]] (clean) vs [[+5.0]] (mutant) as post-imputer tensors. The downstream LinearClassifier then applies its unchanged coefficients [-1.0, 1.0] to different values, producing different scores and a flipped prediction β while coefficients and intercepts remain byte-identical.
Impact
- Prediction manipulation: Model prediction flips (
label 0 β label 1) for the same sentinel numeric input while all classifier weights are unchanged. - Weights unchanged:
LinearClassifier.coefficientsandinterceptsare byte-identical in clean and mutant models. The victim cannot detect manipulation by inspecting weight values. - No error generated:
onnxruntime.InferenceSessionloads the mutatedimputed_value_floatssilently with no warning. - Scope: Any ONNX model using
ai.onnx.ml.Imputerfor numeric preprocessing followed by a numeric classifier where the attacker can supply or modify the distributed.onnxfile (malicious model hub upload, compromised model registry, supply chain substitution).
Proof of Concept
Package structure:
clean.onnx:
Imputer(replaced_value_float=0.0, imputed_value_floats=[-5.0])
β LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0])
mutant.onnx:
Imputer(replaced_value_float=0.0, imputed_value_floats=[+5.0]) β ONLY CHANGE
β LinearClassifier(coefficients=[-1.0, 1.0], intercepts=[0.0, 0.0]) β BYTE-IDENTICAL
Run:
pip install onnx onnxruntime
python reproduce_onnx_imputer_preprocessing_flip.py
Expected final line:
ONNX_IMPUTER_PREPROCESSING_FLIP_CONFIRMED
Runtime Evidence
| Metric | Value |
|---|---|
clean replaced_value_float |
0.0 |
mutant replaced_value_float |
0.0 (identical) |
clean imputed_value_floats |
[-5.0] |
mutant imputed_value_floats |
[+5.0] |
| Input | [[0.0]] (triggers sentinel replacement) |
clean post-imputer tensor |
[[-5.0]] |
mutant post-imputer tensor |
[[+5.0]] |
coefficients clean |
[-1.0, 1.0] |
coefficients mutant |
[-1.0, 1.0] (identical) |
intercepts clean |
[0.0, 0.0] |
intercepts mutant |
[0.0, 0.0] (identical) |
clean label |
0 |
mutant label |
1 |
clean scores |
[5.0, -5.0] |
mutant scores |
[-5.0, 5.0] |
| Prediction flip | 0 β 1 β zero coefficient change |
| clean SHA256 | 835687dbc987082b... |
| mutant SHA256 | 92aa6e45839f19f4... |
| Reproducibility | 5/5 |
Distinctness
| Prior Finding | Root | Distinct | Reason |
|---|---|---|---|
ONNX OneHotEncoder.cats_strings |
Pre-score categorical feature-column binding | β | cats_strings maps a categorical string input to a one-hot column position. imputed_value_floats replaces a sentinel float with a different float value. Different operator, different input type (categorical vs numeric), different mechanism. No overlap. |
ONNX SVMClassifier.classlabels_strings |
Post-inference label rendering | β | classlabels_strings remaps the integer argmax result to a label string AFTER numeric computation completes. imputed_value_floats operates BEFORE any scoring. Different operator, different stage. |
Joblib CountVectorizer.vocabulary_ |
Joblib NLP feature-column binding | β | Different format (pkl vs .onnx), different runtime (sklearn vs ort), different operator class (NLP text vectorizer vs numeric imputer). |
SafeTensors tokenizer.json model.vocab |
HF Transformers NLP tokenization | β | Different format (sidecar JSON vs ONNX internal attribute), different modality (NLP vs numeric float preprocessing). |
SafeTensors preprocessor_config.json |
HF Transformers image normalization | β | Different format, different modality (CV float normalization vs ONNX-internal sentinel float replacement). |
TFLite NormalizationOptions |
FlatBuffer binary preprocessing metadata | β | Different format, different runtime, different spec layer (FlatBuffer metadata vs ONNX operator attribute). |
Non-Claims
The following are not claimed:
- This is not a
.onnxbinary parser vulnerability - This is not an RCE / ACE / arbitrary code execution finding
- Scanner bypass is not the primary impact
- This is not
classlabels_stringsor any output label rendering mechanism - This is not
OneHotEncoder.cats_stringsor any categorical feature-column binding mechanism - This does not claim that no model file content changed;
imputed_value_floatsis a runtime-consumed operator attribute within the.onnxmodel file and is the intentionally mutated component - This does not claim NaN-only behavior; the PoC uses a sentinel value of
0.0
Recommendation
- Preprocessing attribute integrity manifest: Bind safety-critical ML preprocessing operator attributes (
imputed_value_floats,replaced_value_float,scale,offset, etc.) to a model-level integrity manifest at save time and verify at load time.imputed_value_floatscontrols the numeric value fed to downstream classifiers and must be treated as security-relevant model state. - Training-time attribute fingerprint: Store and verify a fingerprint of the expected preprocessing attributes as part of the model provenance record. Structural validation of opset and graph shape is insufficient because an attacker can change
imputed_value_floatswhile preserving graph structure and downstream weight values. - Documentation and warnings: Clearly document that
ai.onnx.ml.Imputer.imputed_value_floatsdetermines which numeric value is fed to downstream numeric classifiers for sentinel inputs. Loading tools should warn when preprocessing attributes differ from the trusted model manifest while downstream numeric weights remain unchanged.
Files
| File | Description |
|---|---|
clean.onnx |
Clean model with imputed_value_floats=[-5.0] |
mutant.onnx |
Mutant model with imputed_value_floats=[+5.0] (only change) |
reproduce_onnx_imputer_preprocessing_flip.py |
Full reproduction script |
inspect_onnx_imputer_hash_matrix.py |
Hash matrix and attribute isolation inspector |
evidence_runtime_results.json |
Runtime evidence (labels, scores, hashes, assertions) |
evidence_hash_matrix.json |
Attribute-level diff matrix |
evidence_distinctness_matrix.json |
Distinctness analysis vs 6 prior findings |
evidence_route_framing.json |
Route and impact framing |
evidence_top_axis.json |
Top attack axis and key invariants |
SHA256SUMS.txt |
SHA256 checksums for all files |
requirements.txt |
Python dependencies |