|
|
--- |
|
|
language: en |
|
|
license: gpl-3.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- vision |
|
|
- image-classification |
|
|
- resnet |
|
|
- pruning |
|
|
- sparse |
|
|
base_model: microsoft/resnet-50 |
|
|
pipeline_tag: image-classification |
|
|
datasets: |
|
|
- ILSVRC/imagenet-1k |
|
|
metrics: |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
# ModHiFi Pruned ResNet-50 (Tiny) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. |
|
|
Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~70% of the parameters** while maintaining competitive accuracy. |
|
|
|
|
|
Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. |
|
|
This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines. |
|
|
|
|
|
- **Developed by:** Machine Learning Lab, Indian Institute of Science |
|
|
- **Model type:** Convolutional Neural Network (Pruned ResNet) |
|
|
- **License:** GNU General Public License v3.0 |
|
|
- **Base Model:** Microsoft ResNet-50 |
|
|
|
|
|
## Performance & Efficiency |
|
|
|
|
|
| Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) | |
|
|
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | |
|
|
| **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 | |
|
|
| **ModHiFi-Tiny** | **~67%** | **73.85%** | **91.83%** | **8.38** | **1.13** | **~33** | |
|
|
|
|
|
On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **2.42x on CPUs** and **2.38x on GPUs**. |
|
|
|
|
|
> **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life. |
|
|
|
|
|
|
|
|
## ⚠️ Critical Note on Preprocessing & Accuracy |
|
|
|
|
|
**Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. |
|
|
|
|
|
Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`. |
|
|
|
|
|
To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training: |
|
|
|
|
|
```python |
|
|
from torchvision import transforms |
|
|
from transformers import pipeline |
|
|
import torch |
|
|
|
|
|
# 1. Define the Exact PyTorch Transform |
|
|
val_transform = transforms.Compose([ |
|
|
transforms.Resize(256), # Resize shortest edge to 256 |
|
|
transforms.CenterCrop(224), # Center crop 224x224 |
|
|
transforms.ToTensor(), # Convert to Tensor (0-1) |
|
|
transforms.Normalize( # ImageNet Normalization |
|
|
mean=[0.485, 0.456, 0.406], |
|
|
std=[0.229, 0.224, 0.225] |
|
|
), |
|
|
]) |
|
|
|
|
|
# 2. Define a Wrapper to force Pipeline to use PyTorch |
|
|
class PyTorchProcessor: |
|
|
def __init__(self, transform): |
|
|
self.transform = transform |
|
|
self.image_processor_type = "custom" |
|
|
|
|
|
def __call__(self, images, **kwargs): |
|
|
if not isinstance(images, list): images = [images] |
|
|
# Apply transforms and stack |
|
|
pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images]) |
|
|
return {"pixel_values": pixel_values} |
|
|
|
|
|
# 3. Initialize Pipeline with Custom Processor |
|
|
pipe = pipeline( |
|
|
"image-classification", |
|
|
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny", |
|
|
image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap |
|
|
trust_remote_code=True, |
|
|
device=0 # Use GPU if available |
|
|
) |
|
|
``` |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline. |
|
|
|
|
|
### Install dependencies |
|
|
|
|
|
```bash |
|
|
pip install torch transformers |
|
|
``` |
|
|
|
|
|
## Inference example |
|
|
|
|
|
```python |
|
|
import requests |
|
|
from PIL import Image |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load model (ensure trust_remote_code=True for custom architecture) |
|
|
pipe = pipeline( |
|
|
"image-classification", |
|
|
model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Load an image |
|
|
url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
|
|
image = Image.open(requests.get(url, stream=True).raw) |
|
|
|
|
|
# Run Inference |
|
|
results = pipe(image) |
|
|
print(f"Predicted Class: {results[0]['label']}") |
|
|
print(f"Confidence: {results[0]['score']:.4f}") |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite the following paper: |
|
|
|
|
|
``` |
|
|
@inproceedings{kashyap2026modhifi, |
|
|
title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, |
|
|
author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib}, |
|
|
booktitle = {Advances in Neural Information Processing Systems}, |
|
|
year = {2025}, |
|
|
eprint = {2511.19566}, |
|
|
archivePrefix = {arXiv}, |
|
|
primaryClass = {cs.LG}, |
|
|
url = {https://arxiv.org/abs/2511.19566}, |
|
|
} |
|
|
``` |
|
|
|