File size: 5,105 Bytes
33c2790 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
language: en
license: gpl-3.0
library_name: transformers
tags:
- vision
- image-classification
- resnet
- pruning
- sparse
base_model: microsoft/resnet-50
pipeline_tag: image-classification
datasets:
- ILSVRC/imagenet-1k
metrics:
- accuracy
---
# ModHiFi Pruned ResNet-50 (Small)
## Model Description
This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture.
Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~30% of the parameters** while achieving *higher accuracy* than the base model.
Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters.
This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines.
- **Developed by:** Machine Learning Lab, Indian Institute of Science
- **Model type:** Convolutional Neural Network (Pruned ResNet)
- **License:** GNU General Public License v3.0
- **Base Model:** Microsoft ResNet-50
## Performance & Efficiency
| Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 |
| **ModHiFi-Small** | **~32%** | **76.70%** | **93.32%** | **17.4** | **1.9** | **~66** |
On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **1.69x on CPUs** and **1.70x on GPUs**.
> **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life.
## ⚠️ Critical Note on Preprocessing & Accuracy
**Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default.
Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`.
To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training:
```python
from torchvision import transforms
from transformers import pipeline
import torch
# 1. Define the Exact PyTorch Transform
val_transform = transforms.Compose([
transforms.Resize(256), # Resize shortest edge to 256
transforms.CenterCrop(224), # Center crop 224x224
transforms.ToTensor(), # Convert to Tensor (0-1)
transforms.Normalize( # ImageNet Normalization
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
),
])
# 2. Define a Wrapper to force Pipeline to use PyTorch
class PyTorchProcessor:
def __init__(self, transform):
self.transform = transform
self.image_processor_type = "custom"
def __call__(self, images, **kwargs):
if not isinstance(images, list): images = [images]
# Apply transforms and stack
pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images])
return {"pixel_values": pixel_values}
# 3. Initialize Pipeline with Custom Processor
pipe = pipeline(
"image-classification",
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small",
image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap
trust_remote_code=True,
device=0 # Use GPU if available
)
```
## Quick Start
If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline.
### Install dependencies
```bash
pip install torch transformers
```
## Inference example
```python
import requests
from PIL import Image
from transformers import pipeline
# Load model (ensure trust_remote_code=True for custom architecture)
pipe = pipeline(
"image-classification",
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small",
trust_remote_code=True
)
# Load an image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# Run Inference
results = pipe(image)
print(f"Predicted Class: {results[0]['label']}")
print(f"Confidence: {results[0]['score']:.4f}")
```
## Citation
If you use this model in your research, please cite the following paper:
```
@inproceedings{kashyap2026modhifi,
title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification},
author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
eprint = {2511.19566},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2511.19566},
}
```
|