xrenaa commited on
Commit
5dcdc89
Β·
verified Β·
1 Parent(s): 4ebccd7

Add model card with checkpoint inventory and teaser

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +101 -0
  3. figures/teaser.jpg +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/teaser.jpg filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ tags:
5
+ - super-resolution
6
+ - diffusion
7
+ - pixel-diffusion-decoder
8
+ - vae-decoder
9
+ pipeline_tag: image-to-image
10
+ ---
11
+
12
+ # PiD β€” Pixel Diffusion Decoder
13
+
14
+ <p align="center">
15
+ <img src="figures/teaser.jpg" alt="PiD teaser" width="100%">
16
+ </p>
17
+
18
+ PiD reformulates the latent-to-pixel decoder as a conditional pixel-space
19
+ diffusion model, unifying decoding and upsampling into a single generative
20
+ module. It denoises directly in high-resolution pixel space and produces a
21
+ super-resolved image in one pass. This repository hosts the released decoder
22
+ checkpoints, plus the encoder/decoder ("VAE") weights they depend on.
23
+
24
+ All `PiD_*` checkpoints in this repo are **4-step distilled**. The non-`PiD_*`
25
+ entries (`ae.safetensors`, `flux2_ae.safetensors`, `sd3_vae/`, `rae/`,
26
+ `scale_rae/`) are **the corresponding encoder/decoder VAE weights** that PiD
27
+ plugs into β€” they're not PiD checkpoints themselves.
28
+
29
+ ## PiD checkpoints
30
+
31
+ Two variants are released for each diffusers-style backbone:
32
+
33
+ - **`2k`** β€” trained at 2048px, used as a 4Γ— decoder (512 LDM β†’ 2048 px), or as
34
+ an 8Γ— decoder for the Scale-RAE backbone (256 β†’ 2048).
35
+ - **`2kto4k`** β€” trained with multi-resolution data bucketing 2048β†’3840 and an
36
+ SD3-style dynamic shift; designed for 1024 LDM β†’ 4K (3840 px) decoding. Only
37
+ released for the diffusers backbones.
38
+
39
+ | Path | Backbone (encoder side) | SR factor | Variant |
40
+ |---------------------------------------------------------------|--------------------------------------------|-----------|-----------|
41
+ | `checkpoints/PiD_res2k_sr4x_official_flux_distill_4step` | Flux1-dev (16-ch VAE) | 4Γ— | 2k |
42
+ | `checkpoints/PiD_res2k_sr4x_official_flux2_distill_4step` | Flux2-dev (128-ch BN VAE) | 4Γ— | 2k |
43
+ | `checkpoints/PiD_res2k_sr4x_official_sd3_distill_4step` | SD3 medium (16-ch VAE) | 4Γ— | 2k |
44
+ | `checkpoints/PiD_res2k_sr4x_official_dinov2_distill_4step` | DINOv2-B + RAE ViT-XL (768-ch) | 4Γ— | 2k |
45
+ | `checkpoints/PiD_res2k_sr8x_official_siglip_distill_4step` | SigLIP-2 So400M + Scale-RAE ViT-XL (1152) | 8Γ— | 2k |
46
+ | `checkpoints/PiD_res2kto4k_sr4x_official_flux_distill_4step` | Flux1-dev (16-ch VAE) | 4Γ— | 2kto4k |
47
+ | `checkpoints/PiD_res2kto4k_sr4x_official_flux2_distill_4step` | Flux2-dev (128-ch BN VAE) | 4Γ— | 2kto4k |
48
+ | `checkpoints/PiD_res2kto4k_sr4x_official_sd3_distill_4step` | SD3 medium (16-ch VAE) | 4Γ— | 2kto4k |
49
+
50
+ Z-Image shares Flux1's VAE, so its inference path reuses the `flux` checkpoints
51
+ (both `2k` and `2kto4k`) β€” no separate `zimage` checkpoint is shipped.
52
+
53
+ Each directory contains a single file, `model_ema_bf16.pth`, which is the EMA
54
+ weights cast to bfloat16 β€” the format the inference scripts load by default.
55
+
56
+ ## VAE / encoder weights
57
+
58
+ These are the per-backbone encoder (and, where applicable, original decoder)
59
+ weights that PiD pairs with. They're hosted here so a single download brings
60
+ everything needed end-to-end.
61
+
62
+ | Path | Description |
63
+ |---------------------------------|--------------------------------------------------------------------------------------|
64
+ | `checkpoints/ae.safetensors` | Flux1-dev / Z-Image 16-ch VAE (encoder + original Flux decoder). |
65
+ | `checkpoints/flux2_ae.safetensors` | Flux2-dev 128-ch BN VAE. |
66
+ | `checkpoints/sd3_vae/` | SD3 medium 16-ch VAE in diffusers format. |
67
+ | `checkpoints/rae/` | DINOv2-B image encoder + RAE ViT-XL decoder + ImageNet-512 normalization statistics. |
68
+ | `checkpoints/scale_rae/` | SigLIP-2 So400M encoder + Scale-RAE ViT-XL decoder + decoder config. |
69
+
70
+ ## Usage
71
+
72
+ The decoder checkpoints are loaded by the inference scripts in the PiD
73
+ codebase. The exact `(backbone, ckpt_type) β†’ path` mapping is the single source
74
+ of truth in
75
+ [`pid/_src/inference/checkpoint_registry.py`](https://github.com/) β€” clone the
76
+ repo, point it at this snapshot, and the demos pick the right file
77
+ automatically:
78
+
79
+ ```bash
80
+ # Download this whole snapshot into ./checkpoints
81
+ hf download nvidia/PiD --local-dir .
82
+
83
+ # Then run any of the demos, e.g.:
84
+ PYTHONPATH=. python -m pid._src.inference.from_ldm_flux \
85
+ --prompt "A photorealistic cat" \
86
+ --ldm_inference_steps 28 --save_xt_steps 22 24 26 \
87
+ --output_dir ./results/demo \
88
+ --cfg_scale 1 --pid_inference_steps 4 --scale 4
89
+ ```
90
+
91
+ Pick the `2kto4k` variant via `--pid_ckpt_type 2kto4k` when decoding at 4K.
92
+
93
+ ## License
94
+
95
+ Released under the **Apache License 2.0**. Copyright 2026 NVIDIA Corporation
96
+ & Affiliates. See the `LICENSE` file in the source repository for the full
97
+ text.
98
+
99
+ The upstream encoder backbones (DINOv2, SigLIP-2, Flux, SD3, Z-Image) and their
100
+ weights remain under their own original licenses; PiD's Apache-2.0 release
101
+ covers only the PiD decoder weights and code.
figures/teaser.jpg ADDED

Git LFS Details

  • SHA256: fb74f71364bd8fc0901650d6c7b5b8ef8efac751b7d248d2c9a3d7accf031d17
  • Pointer size: 132 Bytes
  • Size of remote file: 1.36 MB