File size: 5,105 Bytes
33c2790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
language: en
license: gpl-3.0
library_name: transformers
tags:
- vision
- image-classification
- resnet
- pruning
- sparse
base_model: microsoft/resnet-50
pipeline_tag: image-classification
datasets:
- ILSVRC/imagenet-1k
metrics:
- accuracy
---

# ModHiFi Pruned ResNet-50 (Small)

## Model Description

This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. 
Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~30% of the parameters** while achieving *higher accuracy* than the base model.

Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. 
This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines.

- **Developed by:** Machine Learning Lab, Indian Institute of Science
- **Model type:** Convolutional Neural Network (Pruned ResNet)
- **License:** GNU General Public License v3.0
- **Base Model:** Microsoft ResNet-50

## Performance & Efficiency

| Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 |
| **ModHiFi-Small** | **~32%** | **76.70%** | **93.32%** | **17.4** | **1.9** | **~66** |

On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **1.69x on CPUs** and **1.70x on GPUs**.

> **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life.

## ⚠️ Critical Note on Preprocessing & Accuracy

**Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. 

Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`.

To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training:

```python
from torchvision import transforms
from transformers import pipeline
import torch

# 1. Define the Exact PyTorch Transform
val_transform = transforms.Compose([
    transforms.Resize(256),       # Resize shortest edge to 256
    transforms.CenterCrop(224),   # Center crop 224x224
    transforms.ToTensor(),        # Convert to Tensor (0-1)
    transforms.Normalize(         # ImageNet Normalization
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# 2. Define a Wrapper to force Pipeline to use PyTorch
class PyTorchProcessor:
    def __init__(self, transform):
        self.transform = transform
        self.image_processor_type = "custom"

    def __call__(self, images, **kwargs):
        if not isinstance(images, list): images = [images]
        # Apply transforms and stack
        pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images])
        return {"pixel_values": pixel_values}

# 3. Initialize Pipeline with Custom Processor
pipe = pipeline(
    "image-classification", 
    model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", 
    image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap
    trust_remote_code=True,
    device=0 # Use GPU if available
)
```

## Quick Start

If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline.

### Install dependencies

```bash
pip install torch transformers
```

## Inference example

```python
import requests
from PIL import Image
from transformers import pipeline

# Load model (ensure trust_remote_code=True for custom architecture)
pipe = pipeline(
    "image-classification", 
    model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", 
    trust_remote_code=True
)

# Load an image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Run Inference
results = pipe(image)
print(f"Predicted Class: {results[0]['label']}")
print(f"Confidence: {results[0]['score']:.4f}")
```

## Citation

If you use this model in your research, please cite the following paper:

```
@inproceedings{kashyap2026modhifi,
      title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, 
      author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib},
      booktitle = {Advances in Neural Information Processing Systems},
      year = {2025},
      eprint = {2511.19566},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG},
      url = {https://arxiv.org/abs/2511.19566}, 
}
```