| --- |
| license: other |
| license_name: nsclv1 |
| license_link: LICENSE |
| tags: |
| - image-generation |
| - class-conditional |
| - diffusion |
| - pixel-space |
| - dit |
| - imagenet |
| library_name: pytorch |
| pipeline_tag: unconditional-image-generation |
| --- |
| |
| <p align="center"> |
| <img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="60" /> |
| </p> |
|
|
| <h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2> |
|
|
| <p align="center"> |
| <a href="https://www.yongshengyu.com/">Yongsheng Yu</a><sup>1,2</sup> |
| <a href="https://wxiong.me/">Wei Xiong</a><sup>1†</sup> |
| <a href="https://weilinie.github.io/">Weili Nie</a><sup>1</sup> |
| <a href="https://shengcn.github.io/">Yichen Sheng</a><sup>1</sup> |
| <a href="http://behindthepixels.io/">Shiqiu Liu</a><sup>1</sup> |
| <a href="https://www.cs.rochester.edu/u/jluo/">Jiebo Luo</a><sup>2</sup> |
| </p> |
| <p align="center"> |
| <sup>1</sup>NVIDIA <sup>2</sup>University of Rochester |
| <br> |
| <sup>†</sup>Project Lead and Main Advising |
| </p> |
|
|
| <p align="center"> |
| <a href="https://pixeldit.github.io/"><img src="https://img.shields.io/badge/Website-Project_Page-2ea44f" /></a> |
| |
| <a href="https://arxiv.org/abs/2511.20645"><img src="https://img.shields.io/badge/arXiv-2511.20645-b31b1b.svg" /></a> |
| |
| <a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a> |
| </p> |
|
|
| ## Pre-trained Checkpoints |
|
|
| | Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval | |
| |:---|:---:|:---:|:---:|:---:|:---:|:---:| |
| | `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] | |
| | `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] | |
| | `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] | |
| | `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.81** | 3.5 | 2.0 | [0.1, 1.0] | |
|
|
| All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol. |
|
|
| ## Usage |
|
|
| ### Installation |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Evaluation (Generate 50K Samples) |
|
|
| ```bash |
| cd c2i/ |
| |
| # ImageNet 256x256 (epoch 320, best FID) |
| torchrun --nproc_per_node=8 main.py predict \ |
| -c configs/pix256_xl.yaml \ |
| --ckpt_path=imagenet256_pixeldit_xl_epoch320.ckpt \ |
| --model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \ |
| --model.diffusion_sampler.init_args.num_steps=100 \ |
| --model.diffusion_sampler.init_args.guidance=2.75 \ |
| --model.diffusion_sampler.init_args.timeshift=1.0 \ |
| --model.diffusion_sampler.init_args.guidance_interval_min=0.1 \ |
| --model.diffusion_sampler.init_args.guidance_interval_max=0.9 \ |
| --per_run_seed=false --seed_everything=1000 |
| |
| # ImageNet 512x512 |
| torchrun --nproc_per_node=8 main.py predict \ |
| -c configs/pix512_xl.yaml \ |
| --ckpt_path=imagenet512_pixeldit_xl.ckpt \ |
| --model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \ |
| --model.diffusion_sampler.init_args.num_steps=100 \ |
| --model.diffusion_sampler.init_args.guidance=3.5 \ |
| --model.diffusion_sampler.init_args.timeshift=2.0 \ |
| --model.diffusion_sampler.init_args.guidance_interval_min=0.1 \ |
| --model.diffusion_sampler.init_args.guidance_interval_max=1.0 \ |
| --per_run_seed=false --seed_everything=10000 |
| ``` |
|
|
| After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @inproceedings{yu2025pixeldit, |
| title={PixelDiT: Pixel Diffusion Transformers for Image Generation}, |
| author={Yongsheng Yu and Wei Xiong and Weili Nie and Yichen Sheng and Shiqiu Liu and Jiebo Luo}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| year={2026}, |
| } |
| ``` |
|
|
| ## License |
|
|
| This model is released under the [NSCLv1 License](LICENSE). The work and any derivative works may only be used for non-commercial (research or evaluation) purposes. |
|
|