CurMIM: Curriculum Masked Image Modeling
Hao Liu1
Kun Wang1
Yudong Han1
Haocong Wang1
Yupeng Hu1
Chunxiao Wang2
Liqiang Nie3
1School of Software, Shandong University, Jinan, China
2Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
3School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
This is the official PyTorch implementation of **CurMIM**, a curriculum-based masked image modeling framework for self-supervised visual representation learning.
🔗 **Paper:** [CurMIM: Curriculum Masked Image Modeling](https://ieeexplore.ieee.org/document/10890877)
🔗 **GitHub Repository:** [iLearn-Lab/ICASSP25-CurMIM](https://github.com/iLearn-Lab/ICASSP25-CurMIM)
---
## Model Information
### 1. Model Name
**CurMIM** (**Cur**riculum **M**asked **I**mage **M**odeling).
### 2. Task Type & Applicable Tasks
- **Task Type:** Masked Image Modeling (MIM) / Self-Supervised Visual Representation Learning / Vision Transformer Pretraining
- **Applicable Tasks:** Curriculum-based masked image pretraining, visual representation learning, finetuning, and linear probing for image classification.
### 3. Project Introduction
Masked Image Modeling (MIM) usually adopts a fixed masking strategy during pretraining. **CurMIM** introduces a curriculum-style masking strategy that progressively adjusts masking behavior, enabling the model to learn from easier to harder reconstruction targets and thereby improving representation quality.
The repository provides a complete workflow for **pretraining**, **finetuning**, and **linear probing**, together with utilities for distributed training and experiment management.
### 4. Training Data Source
The model follows the dataset preparation protocol of [MAE](https://github.com/facebookresearch/mae) and is mainly designed for:
- **ImageNet**
- **miniImageNet**
---
## Usage & Basic Inference
This codebase provides scripts for curriculum-based MIM pretraining, finetuning, and linear probing.
### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
```bash
git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git
cd CurMIM
python -m venv .venv
source .venv/bin/activate # Linux / Mac
# .venv\Scripts\activate # Windows
pip install torch torchvision timm==0.3.2 tensorboard
```
### Step 2: Download Model Weights & Data
Follow [MAE](https://github.com/facebookresearch/mae)'s dataset preparation for [ImageNet](https://www.image-net.org/).
### Step 3: Run Testing / Inference
To pretrain the model, run:
```bash
python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \
--accum_iter 2 \
--model {model_type} \
--mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \
--blr 4e-4 --weight_decay 0.05 \
--data_path ../path --output_dir ./output_dir/
```
To finetune the model, run:
```bash
python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \
--batch_size 128 \
--nb_classes {nb_classes} \
--model {model_type} \
--finetune ./checkpoint.pth \
--epochs 100 \
--blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \
--weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \
--dist_eval --data_path ../data/
```
---
## Limitations & Notes
**Disclaimer:** This repository is intended for **academic research purposes only**.
- The model requires access to the original datasets for pretraining and downstream evaluation.
- Training performance may vary depending on model size, masking ratio, and distributed training configuration.
- Users should prepare the dataset following the MAE protocol before reproduction.
---
## Citation
If you find our work useful in your research, please consider citing our paper:
```bibtex
@inproceedings{liu2025curmim,
title={CurMIM: Curriculum Masked Image Modeling},
author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang},
booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
doi={10.1109/ICASSP49660.2025.10890877}
}
```
---
## Contact
**If you have any questions, feel free to contact me at liuh90210@gmail.com**.