Update README.md

6584a95 verified 4 days ago

4.05 kB

	---
	license: other
	license_name: nsclv1
	license_link: LICENSE
	tags:
	- image-generation
	- class-conditional
	- diffusion
	- pixel-space
	- dit
	- imagenet
	library_name: pytorch
	pipeline_tag: unconditional-image-generation
	---

	<p align="center">
	<img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="60" />
	</p>

	<h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>

	<p align="center">
	<a href="https://www.yongshengyu.com/">Yongsheng Yu</a><sup>1,2</sup>
	<a href="https://wxiong.me/">Wei Xiong</a><sup>1†</sup>
	<a href="https://weilinie.github.io/">Weili Nie</a><sup>1</sup>
	<a href="https://shengcn.github.io/">Yichen Sheng</a><sup>1</sup>
	<a href="http://behindthepixels.io/">Shiqiu Liu</a><sup>1</sup>
	<a href="https://www.cs.rochester.edu/u/jluo/">Jiebo Luo</a><sup>2</sup>
	</p>
	<p align="center">
	<sup>1</sup>NVIDIA   <sup>2</sup>University of Rochester
	<br>
	<sup>†</sup>Project Lead and Main Advising
	</p>

	<p align="center">
	<a href="https://pixeldit.github.io/"><img src="https://img.shields.io/badge/Website-Project_Page-2ea44f" /></a>

	<a href="https://arxiv.org/abs/2511.20645"><img src="https://img.shields.io/badge/arXiv-2511.20645-b31b1b.svg" /></a>

	<a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
	</p>

	## Pre-trained Checkpoints

	\| Checkpoint \| Resolution \| Epochs \| gFID \| CFG Scale \| Time Shift \| CFG Interval \|
	\|:---\|:---:\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| `imagenet256_pixeldit_xl_epoch80.ckpt` \| 256x256 \| 80 \| 2.36 \| 3.25 \| 1.0 \| [0.1, 1.0] \|
	\| `imagenet256_pixeldit_xl_epoch160.ckpt` \| 256x256 \| 160 \| 1.97 \| 3.25 \| 1.0 \| [0.1, 1.0] \|
	\| `imagenet256_pixeldit_xl_epoch320.ckpt` \| 256x256 \| 320 \| 1.61 \| 2.75 \| 1.0 \| [0.1, 0.9] \|
	\| `imagenet512_pixeldit_xl.ckpt` \| 512x512 \| 850 \| 1.81 \| 3.5 \| 2.0 \| [0.1, 1.0] \|

	All evaluations use FlowDPMSolver with 100 steps. 50K samples. Metrics follow the ADM evaluation protocol.

	## Usage

	### Installation

	```bash
	pip install -r requirements.txt
	```

	### Evaluation (Generate 50K Samples)

	```bash
	cd c2i/

	# ImageNet 256x256 (epoch 320, best FID)
	torchrun --nproc_per_node=8 main.py predict \
	-c configs/pix256_xl.yaml \
	--ckpt_path=imagenet256_pixeldit_xl_epoch320.ckpt \
	--model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \
	--model.diffusion_sampler.init_args.num_steps=100 \
	--model.diffusion_sampler.init_args.guidance=2.75 \
	--model.diffusion_sampler.init_args.timeshift=1.0 \
	--model.diffusion_sampler.init_args.guidance_interval_min=0.1 \
	--model.diffusion_sampler.init_args.guidance_interval_max=0.9 \
	--per_run_seed=false --seed_everything=1000

	# ImageNet 512x512
	torchrun --nproc_per_node=8 main.py predict \
	-c configs/pix512_xl.yaml \
	--ckpt_path=imagenet512_pixeldit_xl.ckpt \
	--model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \
	--model.diffusion_sampler.init_args.num_steps=100 \
	--model.diffusion_sampler.init_args.guidance=3.5 \
	--model.diffusion_sampler.init_args.timeshift=2.0 \
	--model.diffusion_sampler.init_args.guidance_interval_min=0.1 \
	--model.diffusion_sampler.init_args.guidance_interval_max=1.0 \
	--per_run_seed=false --seed_everything=10000
	```

	After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).

	## Citation

	```bibtex
	@inproceedings{yu2025pixeldit,
	title={PixelDiT: Pixel Diffusion Transformers for Image Generation},
	author={Yongsheng Yu and Wei Xiong and Weili Nie and Yichen Sheng and Shiqiu Liu and Jiebo Luo},
	booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year={2026},
	}
	```

	## License

	This model is released under the [NSCLv1 License](LICENSE). The work and any derivative works may only be used for non-commercial (research or evaluation) purposes.