Hironabe333's picture
Upload README.md with huggingface_hub
73c38e5 verified
# HuggingFace Transformers ImageProcessor Preprocessing Authority Gap
## Summary
A SafeTensors-based HuggingFace Transformers image model package trusts
`preprocessor_config.json` for all image normalization parameters
(`image_mean`, `image_std`, `rescale_factor`) consumed by
`ViTImageProcessor.preprocess()` without any integrity binding to
`model.safetensors`. An attacker who controls the model package can silently
mutate these normalization fields, causing the victim's inference pipeline to
produce adversarially shifted `pixel_values` and different β€” potentially
flipped β€” predictions, while `model.safetensors` and `config.json` remain
byte-identical and show no anomaly.
**This is not a `.safetensors` parser bug.** This is a SafeTensors-based
HuggingFace Transformers image model package issue: the package format lacks
integrity binding between the preprocessing config sidecar and the model
weight file.
---
## Affected Product
- **Package:** `huggingface/transformers` (SafeTensors-based image model package)
- **Load path:** `AutoImageProcessor.from_pretrained()` β†’ `ViTImageProcessor.preprocess()`
- **Root file:** `preprocessor_config.json`
- **Root fields:** `image_mean`, `image_std`, `rescale_factor`
- **Weight file:** `model.safetensors` (unchanged β€” byte-identical in clean and mutant packages)
---
## Vulnerability Details
When a user loads a SafeTensors-based Transformers image model package via:
```python
processor = AutoImageProcessor.from_pretrained("model_dir")
model = AutoModelForImageClassification.from_pretrained("model_dir")
```
The `ViTImageProcessor` reads `image_mean`, `image_std`, and `rescale_factor`
directly from `preprocessor_config.json` at load time. These values are used
to compute `pixel_values`:
```
pixel_values = (raw_pixel * rescale_factor - image_mean) / image_std
```
There is no cryptographic or structural binding between `preprocessor_config.json`
and `model.safetensors`. An attacker who controls the package can mutate
`preprocessor_config.json` β€” a plain JSON file β€” without touching the model
weights at all.
**Mutated field in this PoC:**
- Clean: `image_mean = [0.5, 0.5, 0.5]`
- Mutant: `image_mean = [-0.5, -0.5, -0.5]`
This single field change shifts `pixel_values` by **+2.0 per channel per pixel**,
causing the model to produce adversarially shifted logits and flip predictions,
with no modification to `model.safetensors`.
---
## Impact
- **Prediction manipulation:** Model outputs flip (e.g., dog β†’ cat) while weights
are unchanged. A victim cannot detect this by inspecting `model.safetensors`.
- **Silent attack surface:** `model.safetensors` and `config.json` are
byte-identical between clean and mutant packages. The only changed file is
`preprocessor_config.json`.
- **No warning generated:** `AutoImageProcessor.from_pretrained()` loads the
mutated values without any integrity error.
- **Scope:** Any SafeTensors-based HuggingFace Transformers image model package
where the consumer uses `AutoImageProcessor.from_pretrained()` and
`preprocessor_config.json` is under the attacker's control (e.g., malicious
model on HuggingFace Hub, compromised local model directory).
---
## Proof of Concept
### Package structure
```
clean_model/
config.json ← byte-identical to mutant
model.safetensors ← byte-identical to mutant (SHA256: e9bf24263551...)
preprocessor_config.json ← image_mean = [0.5, 0.5, 0.5]
mutant_model/
config.json ← byte-identical to clean
model.safetensors ← byte-identical to clean (SHA256: e9bf24263551...)
preprocessor_config.json ← image_mean = [-0.5, -0.5, -0.5] ← ONLY CHANGE
```
### Run the reproduce script
```bash
pip install torch transformers safetensors Pillow numpy
python reproduce_transformers_image_processor_preprocessing_flip.py
```
Expected final output:
```
TRANSFORMERS_IMAGE_PROCESSOR_PREPROCESSING_FLIP_CONFIRMED
```
### Run the inspect script
```bash
python inspect_transformers_image_processor_hash_matrix.py
```
Expected final output:
```
TRANSFORMERS_IMAGE_PROCESSOR_PREPROCESSING_HASH_MATRIX_PASS
```
---
## Runtime Evidence
All values from T0 execution (14/14 assertions PASS):
| Metric | Value |
|--------|-------|
| `config.json` SHA256 (clean == mutant) | `0eba781a04d141af...` |
| `model.safetensors` SHA256 (clean == mutant) | `e9bf24263551064e...` |
| `preprocessor_config.json` SHA256 clean | `7016f6ba6ab8...` |
| `preprocessor_config.json` SHA256 mutant | `ebc69b98226f...` |
| `image_mean` clean | `[0.5, 0.5, 0.5]` |
| `image_mean` mutant | `[-0.5, -0.5, -0.5]` |
| `pixel_values` clean mean | `0.017302` |
| `pixel_values` mutant mean | `2.017302` |
| `\|delta\|` mean | `2.000000` |
| `\|delta\|` max | `2.000000` |
| logits clean | `[0.0475, 0.0573]` |
| logits mutant | `[0.0502, 0.0363]` |
| prediction clean | `1 (dog)` |
| prediction mutant | `0 (cat)` |
| **Prediction flip** | **dog β†’ cat** (zero weight change) |
| Model params | 5,666 (ViTForImageClassification, seed=1) |
Load path used:
```
AutoImageProcessor.from_pretrained()
β†’ ViTImageProcessor.__init__()
β†’ reads image_mean / image_std / rescale_factor from preprocessor_config.json
AutoModelForImageClassification.from_pretrained()
β†’ loads model.safetensors
model(pixel_values=inputs["pixel_values"])
β†’ model.forward()
```
---
## Route Framing
This finding targets the **SafeTensors-based HuggingFace Transformers model
package** ecosystem. The vulnerability is not in the `.safetensors` binary
parser itself, but in the package format's lack of integrity binding between:
- `model.safetensors` β€” the weight authority (trusted, cryptographically stable)
- `preprocessor_config.json` β€” the preprocessing authority (untrusted, no binding)
The attack surface exists specifically because the HuggingFace Transformers
package format trusts `preprocessor_config.json` without any integrity link to
the `model.safetensors` it accompanies.
---
## Distinctness
| Prior Finding | Root | Verdict |
|---------------|------|---------|
| tokenizer.json vocabulary (NLP tokenization) | `tokenizer.json` | DISTINCT β€” different modality (CV vs NLP), different class, different computation |
| TFLite FlatBuffer NormalizationOptions | Binary FlatBuffer `NormalizationOptions` (C++ struct) | DISTINCT β€” different framework, format, runtime |
| Joblib vocabulary | pickle binary | DISTINCT β€” different format, domain |
| OpenVINO rt_info | XML embedded metadata | DISTINCT β€” different framework, format |
| TFJS quantization | TF.js quantization params | DISTINCT β€” different framework, semantic |
---
## Non-Claims
The following claims are **NOT** made by this report:
- This is **not** a `.safetensors` binary parser vulnerability
- This is **not** an RCE / ACE / arbitrary code execution finding
- This does **not** require a scanner bypass to be impactful
- `preprocessor_config.json` is **not** claimed to be outside model state β€”
it is runtime-consumed model package state
---
## Recommendation
HuggingFace Transformers should consider one or more of the following mitigations:
1. **Package-level integrity manifest:** Include a signed or hashed manifest
that binds `preprocessor_config.json` to `model.safetensors` at save time
and verifies the binding at load time.
2. **Validation of normalization ranges:** Warn or reject `preprocessor_config.json`
values that fall outside expected normalization ranges (e.g., `|image_mean| > 1.0`).
3. **Documentation:** Clearly document that `preprocessor_config.json` is
security-relevant package state and that consumers loading packages from
untrusted sources should verify all sidecar files.
---
## References
- `reproduce_transformers_image_processor_preprocessing_flip.py` β€” full reproduction script
- `inspect_transformers_image_processor_hash_matrix.py` β€” hash matrix inspection
- `evidence_runtime_results.json` β€” T0 runtime evidence
- `evidence_hash_matrix.json` β€” SHA256 isolation proof
- `evidence_distinctness_matrix.json` β€” distinctness analysis
- `evidence_route_framing.json` β€” route framing statements
- `evidence_top_axis.json` β€” top axis details and attack narrative