--- language: en license: gpl-3.0 library_name: transformers tags: - vision - image-classification - resnet - pruning - sparse base_model: microsoft/resnet-50 pipeline_tag: image-classification datasets: - ILSVRC/imagenet-1k metrics: - accuracy --- # ModHiFi Pruned ResNet-50 (Tiny) ## Model Description This model is a **structurally pruned** version of the standard [ResNet-50](https://huggingface.co/microsoft/resnet-50) architecture. Developed by the **Machine Learning Lab at the Indian Institute of Science**, it has been compressed to remove **~70% of the parameters** while maintaining competitive accuracy. Unlike unstructured pruning (which zeros out weights), **structural pruning** physically removes entire channels and filters. This results in a model that is natively **smaller, faster, and reduces FLOPs** on standard hardware without needing specialized sparse inference engines. - **Developed by:** Machine Learning Lab, Indian Institute of Science - **Model type:** Convolutional Neural Network (Pruned ResNet) - **License:** GNU General Public License v3.0 - **Base Model:** Microsoft ResNet-50 ## Performance & Efficiency | Model Variant | Sparsity | Top-1 Acc | Top-5 Acc | Params (M) | FLOPs (G) | Size (MB) | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | | **Original ResNet-50** | 0% | 76.13% | 92.86% | 25.56 | 4.12 | ~98 | | **ModHiFi-Tiny** | **~67%** | **73.85%** | **91.83%** | **8.38** | **1.13** | **~33** | On the hardware we test on (detailed in our [Paper](https://arxiv.org/abs/2511.19566)) we observe speedups of **2.42x on CPUs** and **2.38x on GPUs**. > **Note:** "FLOPs" measures the number of floating-point operations required for a single inference pass. Lower is better for latency and battery life. ## ⚠️ Critical Note on Preprocessing & Accuracy **Please Read Before Evaluating:** This model was trained and evaluated using standard PyTorch `torchvision.transforms`. The Hugging Face `pipeline` uses `PIL` (Pillow) for image resizing by default. Due to subtle differences in interpolation (Bilinear vs. Bicubic) and anti-aliasing between PyTorch's C++ kernels and PIL, **you may observe a ~0.5% - 1.0% drop in Top-1 accuracy** if you use the default `preprocessor_config.json`. To reproduce the exact numbers listed in the table above, we recommend wrapping the `pipeline` with the exact PyTorch transforms used during training: ```python from torchvision import transforms from transformers import pipeline import torch # 1. Define the Exact PyTorch Transform val_transform = transforms.Compose([ transforms.Resize(256), # Resize shortest edge to 256 transforms.CenterCrop(224), # Center crop 224x224 transforms.ToTensor(), # Convert to Tensor (0-1) transforms.Normalize( # ImageNet Normalization mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] ), ]) # 2. Define a Wrapper to force Pipeline to use PyTorch class PyTorchProcessor: def __init__(self, transform): self.transform = transform self.image_processor_type = "custom" def __call__(self, images, **kwargs): if not isinstance(images, list): images = [images] # Apply transforms and stack pixel_values = torch.stack([self.transform(img.convert("RGB")) for img in images]) return {"pixel_values": pixel_values} # 3. Initialize Pipeline with Custom Processor pipe = pipeline( "image-classification", model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny", image_processor=PyTorchProcessor(val_transform), # <--- Fixes the accuracy gap trust_remote_code=True, device=0 # Use GPU if available ) ``` ## Quick Start If you do not require bit-perfect reproduction of the original accuracy and prefer simplicity, you can use the model directly with the standard Hugging Face pipeline. ### Install dependencies ```bash pip install torch transformers ``` ## Inference example ```python import requests from PIL import Image from transformers import pipeline # Load model (ensure trust_remote_code=True for custom architecture) pipe = pipeline( "image-classification", model="MLLabIISc/ModHiFi-ResNet50-ImageNet-Tiny", trust_remote_code=True ) # Load an image url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw) # Run Inference results = pipe(image) print(f"Predicted Class: {results[0]['label']}") print(f"Confidence: {results[0]['score']:.4f}") ``` ## Citation If you use this model in your research, please cite the following paper: ``` @inproceedings{kashyap2026modhifi, title = {ModHiFi: Identifying High Fidelity predictive components for Model Modification}, author = {Kashyap, Dhruva and Murti, Chaitanya and Nayak, Pranav and Narshana, Tanay and Bhattacharyya, Chiranjib}, booktitle = {Advances in Neural Information Processing Systems}, year = {2025}, eprint = {2511.19566}, archivePrefix = {arXiv}, primaryClass = {cs.LG}, url = {https://arxiv.org/abs/2511.19566}, } ```