Conversions not working for me
I was trying to convert the BF16 model to Q4_K (or anything else) using your example command line and get this error when loading anything I convert in sd.cpp:
|> | 6/1167 - 0.00s/it[INFO ] model.cpp:1645 - unknown tensor 'model.diffusion_model.context_refiner.0.attention.qkv.weight.1 | q5_0 | 2 [3840, 3840, 1, 1, 1]' in model file
[ERROR] model.cpp:1654 - tensor 'model.diffusion_model.context_refiner.0.attention.qkv.weight' has wrong shape in model file: got [3840, 3840, 1, 1], expected [3840, 11520, 1, 1]
[WARN ] model.cpp:1467 - process tensor failed: 'model.diffusion_model.context_refiner.0.attention.qkv.weight'
[INFO ] model.cpp:1612 - loading tensors completed, taking 0.20s (process: 0.00s, read: 0.00s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[ERROR] model.cpp:1670 - load tensors from file failed
Can you tell what's going wrong with my quants? Do I need a particular version of sd.cpp to do the conversion correctly?
If you're able (and I'm still not) please upload a Q4_K quant using your version. Thanks!
This is probably due to the fact that the source model provided don't have the fused qkv layer but instead the to_q, to_k, to_v layers (this conversion is done by ComfyUI: https://github.com/Comfy-Org/ComfyUI/blob/1a72bf20469dee31ad156f819c14f0172cbad222/comfy_extras/nodes_model_patch.py#L193-L220)
sd.cpp is expecting a .qkv. layer i think and can't find it.
This script could be used beforehand to switch the source model to the expected "comfy" format: https://huggingface.co/Comfy-Org/z_image_turbo/blob/main/z_image_convert_original_to_comfy.py
(I'll give it a try)
Here is a Q4_K version of the model, but to be honest, i feel like without the specific diffuser patch they use, it is worthless
I tried converting to q4_k in the same way, but I'm getting the exact same file size as q4_0, so I'm not sure if it's correct. Let me know if you see any major difference compared to q4_0.
In my understanding Q4_0 and Q4_K will yield similar weights, only that the _K one provides better results. I will do a quick try by quantizing the same model in those two settings and do a side-by side comparaison
Here is a quick comparaison of two pictures generated with the same parameters and a quantized model using these tensor type rules:
"^context_refiner\.[0-9]*\.(attention\.(out|qkv)|feed_forward\.(w1|w2|w3)).weight=q8_0,^(layers|noise_refiner)\.[0-9]*\.(adaLN_modulation\.[0-9]*|attention\.(out|qkv)|feed_forward\.(w1|w2|w3))\.weight=q4_0"
"^context_refiner\.[0-9]*\.(attention\.(out|qkv)|feed_forward\.(w1|w2|w3)).weight=q8_0,^(layers|noise_refiner)\.[0-9]*\.(adaLN_modulation\.[0-9]*|attention\.(out|qkv)|feed_forward\.(w1|w2|w3))\.weight=q4_K"
Both models have the exact same size on disk but provide different results. I feel like i prefer the Q4_K quant as the character face looks better.
Thank you both, I'm coming from the LLM side, lots to learn!

