Diffusers example for low vram?

by bertbobson - opened Apr 15

Apr 15

I have tried every guide on loading quantized models and cpu offloading for diffusers I found and nothing seems to work for this model.
Kinda sucks to be unable to run this on 24gb vram.
If there is a way, it'd be cool if you could give an example.
kthxbye

nucleus-ai

NucleusAI org Apr 16

Thanks for letting us know. We're looking into it.

HANDANIKR

Apr 17

I'm not sure if this will be helpful, but there is a document written on how to convert and quantize from safetensors to gguf.
https://github.com/city96/ComfyUI-GGUF/tree/main/tools

jessikat29

Apr 18

•

edited Apr 18

There is a NVFP4 Quantization that you can check out which is only 9gb in size. EDIT: But even that seems to get OOM on 24gb Vram and 64gb System ram as per the contributor.

e-n-v-y

Apr 19

Grab this Comfy PR and you can try it out (Comfy's VRAM management is amazing).

https://github.com/Comfy-Org/ComfyUI/pull/13471

FP8 models here:

https://huggingface.co/e-n-v-y/Nucleus-Image-FP8-e4m3fn-scaled/tree/main

With Comfy's vram optimizations, it'll probably work on smaller cards. I got the FP16 one working on my 4090 (24GB), and the FP8 one was just as good and about twice as fast. I'd be interested to hear about how it does on less VRAM.

siraxe

Apr 19

@e-n-v-y
it works , is there fp16 for comfy to plug and play ?
details seem a bit washy in default settings

ruixiangma

Apr 25

@bertbobson Check https://github.com/vllm-project/vllm-omni/pull/3142. Enable layerwise offload, and 24G is sufficient.

tintwotin

13 days ago

What is the status on a low(er) VRAM Diffusers solution?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment