Conversions not working for me

#1
by sbeltz - opened

I was trying to convert the BF16 model to Q4_K (or anything else) using your example command line and get this error when loading anything I convert in sd.cpp:

|> | 6/1167 - 0.00s/it[INFO ] model.cpp:1645 - unknown tensor 'model.diffusion_model.context_refiner.0.attention.qkv.weight.1 | q5_0 | 2 [3840, 3840, 1, 1, 1]' in model file
[ERROR] model.cpp:1654 - tensor 'model.diffusion_model.context_refiner.0.attention.qkv.weight' has wrong shape in model file: got [3840, 3840, 1, 1], expected [3840, 11520, 1, 1]
[WARN ] model.cpp:1467 - process tensor failed: 'model.diffusion_model.context_refiner.0.attention.qkv.weight'
[INFO ] model.cpp:1612 - loading tensors completed, taking 0.20s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[ERROR] model.cpp:1670 - load tensors from file failed

Can you tell what's going wrong with my quants? Do I need a particular version of sd.cpp to do the conversion correctly?

If you're able (and I'm still not) please upload a Q4_K quant using your version. Thanks!

This is probably due to the fact that the source model provided don't have the fused qkv layer but instead the to_q, to_k, to_v layers (this conversion is done by ComfyUI: https://github.com/Comfy-Org/ComfyUI/blob/1a72bf20469dee31ad156f819c14f0172cbad222/comfy_extras/nodes_model_patch.py#L193-L220)

sd.cpp is expecting a .qkv. layer i think and can't find it.

This script could be used beforehand to switch the source model to the expected "comfy" format: https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/z_image_convert_original_to_comfy.py

(I'll give it a try)

Here is a Q4_K version of the model, but to be honest, i feel like without the specific diffuser patch they use, it is worthless

I tried converting to q4_k in the same way, but I'm getting the exact same file size as q4_0, so I'm not sure if it's correct. Let me know if you see any major difference compared to q4_0.

In my understanding Q4_0 and Q4_K will yield similar weights, only that the _K one provides better results. I will do a quick try by quantizing the same model in those two settings and do a side-by side comparaison

Here is a quick comparaison of two pictures generated with the same parameters and a quantized model using these tensor type rules:

"^context_refiner\.[0-9]*\.(attention\.(out|qkv)|feed_forward\.(w1|w2|w3)).weight=q8_0,^(layers|noise_refiner)\.[0-9]*\.(adaLN_modulation\.[0-9]*|attention\.(out|qkv)|feed_forward\.(w1|w2|w3))\.weight=q4_0"

Q4_0

"^context_refiner\.[0-9]*\.(attention\.(out|qkv)|feed_forward\.(w1|w2|w3)).weight=q8_0,^(layers|noise_refiner)\.[0-9]*\.(adaLN_modulation\.[0-9]*|attention\.(out|qkv)|feed_forward\.(w1|w2|w3))\.weight=q4_K"

Q4_K

Both models have the exact same size on disk but provide different results. I feel like i prefer the Q4_K quant as the character face looks better.

Thank you both, I'm coming from the LLM side, lots to learn!

sbeltz changed discussion status to closed

Sign up or log in to comment