| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | ## Comparison |
| |  |
| |
|
| |
|
| | ## === Metrics === |
| | ``` |
| | SD15 VAE | MSE=2.732e-03 PSNR=28.10 LPIPS=0.147 Edge=0.206 KL=19.821 | Z[min/mean/max/std]=[-17.375, 0.072, 16.203, 0.900] | Skew[min/mean/max]=[-0.543, -0.126, 0.070] | Kurt[min/mean/max]=[-0.151, 1.228, 4.574] |
| | SDXL VAE fp16 fix | MSE=2.018e-03 PSNR=29.67 LPIPS=0.124 Edge=0.188 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.107] |
| | AiArtLab/sdxl_vae | MSE=1.736e-03 PSNR=30.29 LPIPS=0.116 Edge=0.181 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.107] |
| | LTX-Video VAE | MSE=1.202e-03 PSNR=31.84 LPIPS=0.141 Edge=0.168 KL=6.656 | Z[min/mean/max/std]=[-5.043, 0.011, 4.969, 0.272] | Skew[min/mean/max]=[-0.542, -0.018, 0.411] | Kurt[min/mean/max]=[-0.576, 0.741, 1.843] |
| | Wan2.2-TI2V-5B | MSE=7.782e-04 PSNR=34.25 LPIPS=0.052 Edge=0.121 KL=9.472 | Z[min/mean/max/std]=[-4.789, -0.012, 4.266, 0.375] | Skew[min/mean/max]=[-0.397, 0.022, 0.653] | Kurt[min/mean/max]=[-0.482, 0.006, 0.538] |
| | AiArtLab/wan16x_vae | MSE=7.275e-04 PSNR=34.51 LPIPS=0.051 Edge=0.118 KL=9.472 | Z[min/mean/max/std]=[-4.789, -0.012, 4.266, 0.375] | Skew[min/mean/max]=[-0.397, 0.022, 0.653] | Kurt[min/mean/max]=[-0.482, 0.006, 0.538] |
| | Wan2.2-T2V-A14B | MSE=7.073e-04 PSNR=34.59 LPIPS=0.048 Edge=0.115 KL=7.781 | Z[min/mean/max/std]=[-15.336, -0.159, 17.703, 2.563] | Skew[min/mean/max]=[-0.343, 0.006, 0.367] | Kurt[min/mean/max]=[-0.538, -0.071, 0.594] |
| | QwenImage | MSE=6.549e-04 PSNR=35.21 LPIPS=0.047 Edge=0.110 KL=7.776 | Z[min/mean/max/std]=[-15.297, -0.158, 17.688, 2.561] | Skew[min/mean/max]=[-0.346, 0.005, 0.368] | Kurt[min/mean/max]=[-0.538, -0.072, 0.597] |
| | AuraDiffusion/16ch-vae | MSE=5.361e-04 PSNR=35.80 LPIPS=0.041 Edge=0.100 KL=4.421 | Z[min/mean/max/std]=[-1.373, -0.005, 1.621, 0.165] | Skew[min/mean/max]=[-0.331, 0.040, 0.413] | Kurt[min/mean/max]=[-0.170, 0.303, 0.670] |
| | FLUX.1-schnell VAE | MSE=4.594e-04 PSNR=35.87 LPIPS=0.035 Edge=0.088 KL=13.016 | Z[min/mean/max/std]=[-5.824, -0.076, 6.246, 0.945] | Skew[min/mean/max]=[-0.268, 0.048, 0.483] | Kurt[min/mean/max]=[-0.498, 0.037, 0.568] |
| | AiArtLab/simplevae | MSE=4.818e-04 PSNR=36.20 LPIPS=0.035 Edge=0.095 KL=4.032 | Z[min/mean/max/std]=[-7.762, -0.061, 9.914, 0.965] | Skew[min/mean/max]=[-0.320, 0.044, 0.411] | Kurt[min/mean/max]=[-0.045, 0.346, 0.696] |
| | ``` |
| | ## === Percent === |
| | ``` |
| | | Model | PSNR | LPIPS | Edge | |
| | |----------------------------|-----------|-----------|-----------| |
| | | SD15 VAE | 100% | 100% | 100% | |
| | | SDXL VAE fp16 fix | 105.6% | 118.3% | 109.7% | |
| | | AiArtLab/sdxl_vae | 107.8% | 126.8% | 113.8% | |
| | | LTX-Video VAE | 113.3% | 103.8% | 122.5% | |
| | | Wan2.2-TI2V-5B | 121.9% | 280.8% | 170.8% | |
| | | AiArtLab/wan16x_vae | 122.8% | 287.3% | 174.2% | |
| | | Wan2.2-T2V-A14B | 123.1% | 303.2% | 179.4% | |
| | | QwenImage | 125.3% | 308.8% | 188.0% | |
| | | AuraDiffusion/16ch-vae | 127.4% | 355.5% | 206.6% | |
| | | FLUX.1-schnell VAE | 127.6% | 424.4% | 234.8% | |
| | | AiArtLab/simplevae | 128.8% | 415.2% | 217.7% | |
| | ``` |
| |
|
| | ## Compare |
| |
|
| | https://imgsli.com/NDE1MzE0/5/2 |
| |
|
| | ### Diffusers |
| | ``` |
| | from diffusers import AutoencoderKL |
| | vae = AutoencoderKL.from_pretrained("AiArtLab/simplevae",subfolder="vae").cuda().half() |
| | |
| | ``` |
| |
|
| | ## VAE Training Process |
| |
|
| | - Inited from AuraDiffusion/16ch-vae (not compatible), added mid block/retrained |
| | - Dataset: 100,000 PNG images |
| | - Training Time: ~ 2 weeks |
| | - Hardware: Single RTX 5090 |
| | - Resolution: 512px |
| | - Precision: FP32 |
| | - Effective Batch Size: 16 |
| | - Optimizer: AdamW (8-bit) |
| | - Balanced losses (lpips, MSE, MAE, Edge, KL) |
| | ## Source |
| |
|
| | https://huggingface.co/AiArtLab/simplevae/blob/main/train_vae.py |
| | |
| | ## Acknowledgments |
| | - **[Stan](https://t.me/Stangle)** — Key investor. Thank you for believing in us when others called it madness. |
| | - **Captainsaturnus** |
| | - **Love. Death. Transformers.** |
| | - **TOPAPEC** |
| | |
| | ## Donations |
| | |
| | Please contact with us if you may provide some GPU's or money on training |
| | |
| | DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83 |
| | |
| | BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN |
| | |
| | ## Contacts |
| | |
| | [recoilme](https://t.me/recoilme) |
| | |
| | ## Test training |
| | |
| | [test train](trainvideo.mp4) |
| | |