File size: 4,049 Bytes
08f246c 8827346 08f246c 6584a95 08f246c 6584a95 08f246c b54abc4 08f246c c6d5f1a 08f246c c6d5f1a 08f246c 8827346 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | ---
license: other
license_name: nsclv1
license_link: LICENSE
tags:
- image-generation
- class-conditional
- diffusion
- pixel-space
- dit
- imagenet
library_name: pytorch
pipeline_tag: unconditional-image-generation
---
<p align="center">
<img src="https://raw.githubusercontent.com/NVlabs/PixelDiT/master/assets/pixeldit-logo.png" height="60" />
</p>
<h2 align="center">PixelDiT: Pixel Diffusion Transformers for Image Generation</h2>
<p align="center">
<a href="https://www.yongshengyu.com/">Yongsheng Yu</a><sup>1,2</sup>
<a href="https://wxiong.me/">Wei Xiong</a><sup>1†</sup>
<a href="https://weilinie.github.io/">Weili Nie</a><sup>1</sup>
<a href="https://shengcn.github.io/">Yichen Sheng</a><sup>1</sup>
<a href="http://behindthepixels.io/">Shiqiu Liu</a><sup>1</sup>
<a href="https://www.cs.rochester.edu/u/jluo/">Jiebo Luo</a><sup>2</sup>
</p>
<p align="center">
<sup>1</sup>NVIDIA <sup>2</sup>University of Rochester
<br>
<sup>†</sup>Project Lead and Main Advising
</p>
<p align="center">
<a href="https://pixeldit.github.io/"><img src="https://img.shields.io/badge/Website-Project_Page-2ea44f" /></a>
<a href="https://arxiv.org/abs/2511.20645"><img src="https://img.shields.io/badge/arXiv-2511.20645-b31b1b.svg" /></a>
<a href="https://github.com/NVlabs/PixelDiT"><img src="https://img.shields.io/badge/GitHub-Code-blue" /></a>
</p>
## Pre-trained Checkpoints
| Checkpoint | Resolution | Epochs | gFID | CFG Scale | Time Shift | CFG Interval |
|:---|:---:|:---:|:---:|:---:|:---:|:---:|
| `imagenet256_pixeldit_xl_epoch80.ckpt` | 256x256 | 80 | **2.36** | 3.25 | 1.0 | [0.1, 1.0] |
| `imagenet256_pixeldit_xl_epoch160.ckpt` | 256x256 | 160 | **1.97** | 3.25 | 1.0 | [0.1, 1.0] |
| `imagenet256_pixeldit_xl_epoch320.ckpt` | 256x256 | 320 | **1.61** | 2.75 | 1.0 | [0.1, 0.9] |
| `imagenet512_pixeldit_xl.ckpt` | 512x512 | 850 | **1.81** | 3.5 | 2.0 | [0.1, 1.0] |
All evaluations use **FlowDPMSolver** with **100 steps**. 50K samples. Metrics follow the ADM evaluation protocol.
## Usage
### Installation
```bash
pip install -r requirements.txt
```
### Evaluation (Generate 50K Samples)
```bash
cd c2i/
# ImageNet 256x256 (epoch 320, best FID)
torchrun --nproc_per_node=8 main.py predict \
-c configs/pix256_xl.yaml \
--ckpt_path=imagenet256_pixeldit_xl_epoch320.ckpt \
--model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \
--model.diffusion_sampler.init_args.num_steps=100 \
--model.diffusion_sampler.init_args.guidance=2.75 \
--model.diffusion_sampler.init_args.timeshift=1.0 \
--model.diffusion_sampler.init_args.guidance_interval_min=0.1 \
--model.diffusion_sampler.init_args.guidance_interval_max=0.9 \
--per_run_seed=false --seed_everything=1000
# ImageNet 512x512
torchrun --nproc_per_node=8 main.py predict \
-c configs/pix512_xl.yaml \
--ckpt_path=imagenet512_pixeldit_xl.ckpt \
--model.diffusion_sampler.class_path=src.diffusion.FlowDPMSolverSampler \
--model.diffusion_sampler.init_args.num_steps=100 \
--model.diffusion_sampler.init_args.guidance=3.5 \
--model.diffusion_sampler.init_args.timeshift=2.0 \
--model.diffusion_sampler.init_args.guidance_interval_min=0.1 \
--model.diffusion_sampler.init_args.guidance_interval_max=1.0 \
--per_run_seed=false --seed_everything=10000
```
After generating samples, compute FID with the [ADM evaluation toolkit](https://github.com/openai/guided-diffusion/tree/main/evaluations).
## Citation
```bibtex
@inproceedings{yu2025pixeldit,
title={PixelDiT: Pixel Diffusion Transformers for Image Generation},
author={Yongsheng Yu and Wei Xiong and Weili Nie and Yichen Sheng and Shiqiu Liu and Jiebo Luo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
}
```
## License
This model is released under the [NSCLv1 License](LICENSE). The work and any derivative works may only be used for non-commercial (research or evaluation) purposes.
|