File size: 5,105 Bytes

33c2790

---
language: en
license: gpl-3.0
library_name: transformers
tags:
- vision
- image-classification
- resnet
- pruning
- sparse
base_model: microsoft/resnet-50
pipeline_tag: image-classification
datasets:
- ILSVRC/imagenet-1k
metrics:
- accuracy
---

# ModHiFi Pruned ResNet-50 (Small)

## Model Description

This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. 
Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~30% of the parameters** while achieving *higher accuracy* than the base model.

Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. 
This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines.

- **Developed by:** Machine Learning Lab, Indian Institute of Science
- **Model type:** Convolutional Neural Network (Pruned ResNet)
- **License:** GNU General Public License v3.0
- **Base Model:** Microsoft ResNet-50

## Performance & Efficiency

| Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 |
| **ModHiFi-Small** | **~32%** | **76.70%** | **93.32%** | **17.4** | **1.9** | **~66** |

On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **1.69x on CPUs** and **1.70x on GPUs**.

> **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life.

## ⚠️ Critical Note on Preprocessing & Accuracy

**Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. 

Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`.

To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training:

```python
from torchvision import transforms
from transformers import pipeline
import torch

# 1. Define the Exact PyTorch Transform
val_transform = transforms.Compose([
    transforms.Resize(256),       # Resize shortest edge to 256
    transforms.CenterCrop(224),   # Center crop 224x224
    transforms.ToTensor(),        # Convert to Tensor (0-1)
    transforms.Normalize(         # ImageNet Normalization
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

# 2. Define a Wrapper to force Pipeline to use PyTorch
class PyTorchProcessor:
    def __init__(self, transform):
        self.transform = transform
        self.image_processor_type = "custom"

    def __call__(self, images, **kwargs):
        if not isinstance(images, list): images = [images]
        # Apply transforms and stack
        pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images])
        return {"pixel_values": pixel_values}

# 3. Initialize Pipeline with Custom Processor
pipe = pipeline(
    "image-classification", 
    model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", 
    image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap
    trust_remote_code=True,
    device=0 # Use GPU if available
)
```

## Quick Start

If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline.

### Install dependencies

```bash
pip install torch transformers
```

## Inference example

```python
import requests
from PIL import Image
from transformers import pipeline

# Load model (ensure trust_remote_code=True for custom architecture)
pipe = pipeline(
    "image-classification", 
    model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Small", 
    trust_remote_code=True
)

# Load an image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

# Run Inference
results = pipe(image)
print(f"Predicted Class: {results[0]['label']}")
print(f"Confidence: {results[0]['score']:.4f}")
```

## Citation

If you use this model in your research, please cite the following paper:

```
@inproceedings{kashyap2026modhifi,
      title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, 
      author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib},
      booktitle = {Advances in Neural Information Processing Systems},
      year = {2025},
      eprint = {2511.19566},
      archivePrefix = {arXiv},
      primaryClass = {cs.LG},
      url = {https://arxiv.org/abs/2511.19566}, 
}
```