Update README.md

717dae3 verified 12 days ago

4.74 kB

	---
	license: apache-2.0
	tags:
	- pytorch
	---

	<a id="top"></a>
	<div align="center">
	<h1> CurMIM: Curriculum Masked Image Modeling</h1>

	<p>
	<b>Hao Liu</b><sup>1</sup>
	<b>Kun Wang</b><sup>1</sup>
	<b>Yudong Han</b><sup>1</sup>
	<b>Haocong Wang</b><sup>1</sup>
	<b>Yupeng Hu</b><sup>1</sup>
	<b>Chunxiao Wang</b><sup>2</sup>
	<b>Liqiang Nie</b><sup>3</sup>
	</p>

	<p>
	<sup>1</sup>School of Software, Shandong University, Jinan, China<br>
	<sup>2</sup>Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China<br>
	<sup>3</sup>School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
	</p>
	</div>

	This is the official PyTorch implementation of CurMIM, a curriculum-based masked image modeling framework for self-supervised visual representation learning.

	🔗 Paper: [CurMIM: Curriculum Masked Image Modeling](https://ieeexplore.ieee.org/document/10890877)
	🔗 GitHub Repository: [iLearn-Lab/ICASSP25-CurMIM](https://github.com/iLearn-Lab/ICASSP25-CurMIM)

	---

	## Model Information

	### 1. Model Name
	CurMIM (Curriculum Masked Image Modeling).

	### 2. Task Type & Applicable Tasks
	- Task Type: Masked Image Modeling (MIM) / Self-Supervised Visual Representation Learning / Vision Transformer Pretraining
	- Applicable Tasks: Curriculum-based masked image pretraining, visual representation learning, finetuning, and linear probing for image classification.

	### 3. Project Introduction
	Masked Image Modeling (MIM) usually adopts a fixed masking strategy during pretraining. CurMIM introduces a curriculum-style masking strategy that progressively adjusts masking behavior, enabling the model to learn from easier to harder reconstruction targets and thereby improving representation quality.

	The repository provides a complete workflow for pretraining, finetuning, and linear probing, together with utilities for distributed training and experiment management.

	### 4. Training Data Source
	The model follows the dataset preparation protocol of [MAE](https://github.com/facebookresearch/mae) and is mainly designed for:
	- ImageNet
	- miniImageNet

	---

	## Usage & Basic Inference

	This codebase provides scripts for curriculum-based MIM pretraining, finetuning, and linear probing.

	### Step 1: Prepare the Environment
	Clone the GitHub repository and install dependencies:
	```bash
	git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git
	cd CurMIM
	python -m venv .venv
	source .venv/bin/activate # Linux / Mac
	# .venv\Scripts\activate # Windows
	pip install torch torchvision timm==0.3.2 tensorboard
	```

	### Step 2: Download Model Weights & Data
	Follow [MAE](https://github.com/facebookresearch/mae)'s dataset preparation for [ImageNet](https://www.image-net.org/).


	### Step 3: Run Testing / Inference
	To pretrain the model, run:
	```bash
	python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \
	--accum_iter 2 \
	--model {model_type} \
	--mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \
	--blr 4e-4 --weight_decay 0.05 \
	--data_path ../path --output_dir ./output_dir/
	```

	To finetune the model, run:
	```bash
	python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \
	--batch_size 128 \
	--nb_classes {nb_classes} \
	--model {model_type} \
	--finetune ./checkpoint.pth \
	--epochs 100 \
	--blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \
	--weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
	--dist_eval --data_path ../data/
	```

	---

	## Limitations & Notes

	Disclaimer: This repository is intended for academic research purposes only.
	- The model requires access to the original datasets for pretraining and downstream evaluation.
	- Training performance may vary depending on model size, masking ratio, and distributed training configuration.
	- Users should prepare the dataset following the MAE protocol before reproduction.

	---

	## Citation

	If you find our work useful in your research, please consider citing our paper:

	```bibtex
	@inproceedings{liu2025curmim,
	title={CurMIM: Curriculum Masked Image Modeling},
	author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang},
	booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	pages={1--5},
	year={2025},
	doi={10.1109/ICASSP49660.2025.10890877}
	}
	```

	---
	## Contact
	If you have any questions, feel free to contact me at liuh90210@gmail.com.