Dhruva-hf's picture
Update README.md
69ebf84 verified
---
language: en
license: gpl-3.0
library_name: transformers
tags:
- vision
- image-classification
- resnet
- pruning
- sparse
base_model: microsoft/resnet-50
pipeline_tag: image-classification
datasets:
- ILSVRC/imagenet-1k
metrics:
- accuracy
---
# ModHiFi Pruned ResNet-50 (Tiny)
## Model Description
This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture.
Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~70% of the parameters** while maintaining competitive accuracy.
Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters.
This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines.
- **Developed by:** Machine Learning Lab, Indian Institute of Science
- **Model type:** Convolutional Neural Network (Pruned ResNet)
- **License:** GNU General Public License v3.0
- **Base Model:** Microsoft ResNet-50
## Performance & Efficiency
| Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 |
| **ModHiFi-Tiny** | **~67%** | **73.85%** | **91.83%** | **8.38** | **1.13** | **~33** |
On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **2.42x on CPUs** and **2.38x on GPUs**.
> **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life.
## ⚠️ Critical Note on Preprocessing & Accuracy
**Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default.
Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`.
To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training:
```python
from torchvision import transforms
from transformers import pipeline
import torch
# 1. Define the Exact PyTorch Transform
val_transform = transforms.Compose([
transforms.Resize(256), # Resize shortest edge to 256
transforms.CenterCrop(224), # Center crop 224x224
transforms.ToTensor(), # Convert to Tensor (0-1)
transforms.Normalize( # ImageNet Normalization
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
),
])
# 2. Define a Wrapper to force Pipeline to use PyTorch
class PyTorchProcessor:
def __init__(self, transform):
self.transform = transform
self.image_processor_type = "custom"
def __call__(self, images, **kwargs):
if not isinstance(images, list): images = [images]
# Apply transforms and stack
pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images])
return {"pixel_values": pixel_values}
# 3. Initialize Pipeline with Custom Processor
pipe = pipeline(
"image-classification",
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny",
image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap
trust_remote_code=True,
device=0 # Use GPU if available
)
```
## Quick Start
If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline.
### Install dependencies
```bash
pip install torch transformers
```
## Inference example
```python
import requests
from PIL import Image
from transformers import pipeline
# Load model (ensure trust_remote_code=True for custom architecture)
pipe = pipeline(
"image-classification",
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny",
trust_remote_code=True
)
# Load an image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# Run Inference
results = pipe(image)
print(f"Predicted Class: {results[0]['label']}")
print(f"Confidence: {results[0]['score']:.4f}")
```
## Citation
If you use this model in your research, please cite the following paper:
```
@inproceedings{kashyap2026modhifi,
title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification},
author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
eprint = {2511.19566},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2511.19566},
}
```