| --- |
| license: apache-2.0 |
| tags: |
| - pytorch |
| --- |
| |
| <a id="top"></a> |
| <div align="center"> |
| <h1> CurMIM: Curriculum Masked Image Modeling</h1> |
|
|
| <p> |
| <b>Hao Liu</b><sup>1</sup> |
| <b>Kun Wang</b><sup>1</sup> |
| <b>Yudong Han</b><sup>1</sup> |
| <b>Haocong Wang</b><sup>1</sup> |
| <b>Yupeng Hu</b><sup>1</sup> |
| <b>Chunxiao Wang</b><sup>2</sup> |
| <b>Liqiang Nie</b><sup>3</sup> |
| </p> |
| |
| <p> |
| <sup>1</sup>School of Software, Shandong University, Jinan, China<br> |
| <sup>2</sup>Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China<br> |
| <sup>3</sup>School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China |
| </p> |
| </div> |
| |
| This is the official PyTorch implementation of **CurMIM**, a curriculum-based masked image modeling framework for self-supervised visual representation learning. |
|
|
| ๐ **Paper:** [CurMIM: Curriculum Masked Image Modeling](https://ieeexplore.ieee.org/document/10890877) |
| ๐ **GitHub Repository:** [iLearn-Lab/ICASSP25-CurMIM](https://github.com/iLearn-Lab/ICASSP25-CurMIM) |
|
|
| --- |
|
|
| ## Model Information |
|
|
| ### 1. Model Name |
| **CurMIM** (**Cur**riculum **M**asked **I**mage **M**odeling). |
|
|
| ### 2. Task Type & Applicable Tasks |
| - **Task Type:** Masked Image Modeling (MIM) / Self-Supervised Visual Representation Learning / Vision Transformer Pretraining |
| - **Applicable Tasks:** Curriculum-based masked image pretraining, visual representation learning, finetuning, and linear probing for image classification. |
|
|
| ### 3. Project Introduction |
| Masked Image Modeling (MIM) usually adopts a fixed masking strategy during pretraining. **CurMIM** introduces a curriculum-style masking strategy that progressively adjusts masking behavior, enabling the model to learn from easier to harder reconstruction targets and thereby improving representation quality. |
|
|
| The repository provides a complete workflow for **pretraining**, **finetuning**, and **linear probing**, together with utilities for distributed training and experiment management. |
|
|
| ### 4. Training Data Source |
| The model follows the dataset preparation protocol of [MAE](https://github.com/facebookresearch/mae) and is mainly designed for: |
| - **ImageNet** |
| - **miniImageNet** |
|
|
| --- |
|
|
| ## Usage & Basic Inference |
|
|
| This codebase provides scripts for curriculum-based MIM pretraining, finetuning, and linear probing. |
|
|
| ### Step 1: Prepare the Environment |
| Clone the GitHub repository and install dependencies: |
| ```bash |
| git clone https://github.com/iLearn-Lab/ICASSP25-CurMIM.git |
| cd CurMIM |
| python -m venv .venv |
| source .venv/bin/activate # Linux / Mac |
| # .venv\Scripts\activate # Windows |
| pip install torch torchvision timm==0.3.2 tensorboard |
| ``` |
|
|
| ### Step 2: Download Model Weights & Data |
| Follow [MAE](https://github.com/facebookresearch/mae)'s dataset preparation for [ImageNet](https://www.image-net.org/). |
|
|
|
|
| ### Step 3: Run Testing / Inference |
| To pretrain the model, run: |
| ```bash |
| python -m torch.distributed.launch --nproc_per_node {GPU_number} ./main_pretrain.py --batch_size 128 \ |
| --accum_iter 2 \ |
| --model {model_type} \ |
| --mask_ratio 0.75 --epochs 300 --warmup_epochs 40 \ |
| --blr 4e-4 --weight_decay 0.05 \ |
| --data_path ../path --output_dir ./output_dir/ |
| ``` |
|
|
| To finetune the model, run: |
| ```bash |
| python -m torch.distributed.launch --nproc_per_node={GPU_number} ./main_finetune.py \ |
| --batch_size 128 \ |
| --nb_classes {nb_classes} \ |
| --model {model_type} \ |
| --finetune ./checkpoint.pth \ |
| --epochs 100 \ |
| --blr 1e-3 --layer_decay 0.65 --output_dir ./finetune \ |
| --weight_decay 0.05 --drop_path 0.1 --mixup 0.8 --cutmix 1.0 --reprob 0.25 \ |
| --dist_eval --data_path ../data/ |
| ``` |
|
|
| --- |
|
|
| ## Limitations & Notes |
|
|
| **Disclaimer:** This repository is intended for **academic research purposes only**. |
| - The model requires access to the original datasets for pretraining and downstream evaluation. |
| - Training performance may vary depending on model size, masking ratio, and distributed training configuration. |
| - Users should prepare the dataset following the MAE protocol before reproduction. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you find our work useful in your research, please consider citing our paper: |
|
|
| ```bibtex |
| @inproceedings{liu2025curmim, |
| title={CurMIM: Curriculum Masked Image Modeling}, |
| author={Liu, Hao and Wang, Kun and Han, Yudong and Wang, Haocong and Hu, Yupeng and Wang, Chunxiao and Nie, Liqiang}, |
| booktitle={2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, |
| pages={1--5}, |
| year={2025}, |
| doi={10.1109/ICASSP49660.2025.10890877} |
| } |
| ``` |
|
|
| --- |
| ## Contact |
| **If you have any questions, feel free to contact me at liuh90210@gmail.com**. |