Instructions to use BiliSakura/DiCo-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/DiCo-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/DiCo-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "golden retriever" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
BiliSakura/DiCo-diffusers
Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline.py, component modules, and weights.
Converted from shallowdream204/DiCo using DiCo-diffusers.
Available checkpoints
| Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params |
|---|---|---|---|---|---|---|---|
DiCo-S-256/ |
DiCoPipeline |
256Γ256 | DiCo-S-400K-256x256.pt |
1.0 | 49.97 | 31.38 | 33M |
DiCo-B-256/ |
DiCoPipeline |
256Γ256 | DiCo-B-400K-256x256.pt |
1.0 | 27.20 | 56.52 | 130M |
DiCo-L-256/ |
DiCoPipeline |
256Γ256 | DiCo-L-400K-256x256.pt |
1.0 | 13.66 | 91.37 | 464M |
DiCo-XL-256/ |
DiCoPipeline |
256Γ256 | DiCo-XL-3750K-256x256.pt |
1.4 | 2.05 | 282.17 | 701M |
DiCo denoises VAE latents (4 channels, 32Γ32 for 256Γ256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: stabilityai/sd-vae-ft-ema. Scheduler: DDIMScheduler (1000 train steps, linear betas).
Repo layout
BiliSakura/DiCo-diffusers/
βββ README.md
βββ demo_inference.py
βββ DiCo-S-256/
βββ DiCo-B-256/
βββ DiCo-L-256/
βββ DiCo-XL-256/
βββ pipeline.py
βββ model_index.json
βββ demo.png
βββ scheduler/scheduler_config.json
βββ transformer/
βββ vae/
Each variant is self-contained. The scheduler/ folder uses built-in DDIMScheduler from PyPI diffusers.
ImageNet class labels
id2label is embedded in each variant's model_index.json (DiT-style).
pipe.id2labelβ inspect id β English label correspondencepipe.labelsβ reverse map (English synonym β id)pipe.get_label_ids("golden retriever")pipe(class_labels="golden retriever", ...)β string labels resolved automatically
Demo
Class 207 β golden retriever, 256Γ256, 250 steps, guidance_scale=1.4.
python demo_inference.py
python demo_inference.py --variant s # DiCo-S-256, CFG 1.0
Load from a local clone
ImageNet 256Γ256 (DiCo-XL-256)
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels="golden retriever",
height=256,
width=256,
num_inference_steps=250,
guidance_scale=1.4,
generator=generator,
).images[0]
image.save("demo.png")
Recommended inference settings
| Variant | Steps | CFG scale |
|---|---|---|
DiCo-S-256 |
250 | 1.0 |
DiCo-B-256 |
250 | 1.0 |
DiCo-L-256 |
250 | 1.0 |
DiCo-XL-256 |
250 | 1.4 |
Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).
Conversion
cd libs/DiCo-diffusers
python scripts/convert_dico_to_diffusers.py \
--checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
--output /path/to/DiCo-XL-256 \
--model-type DiCo-XL \
--weights ema \
--safe-serialization \
--id2label ../../src/labels/id2label_en.json
Citation
@inproceedings{ai2025dico,
title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=UnslcaZSnb}
}
License
Weights are converted from checkpoints released under the Apache 2.0 license.
- Downloads last month
- -
