ComfyUI will still cast all layers to fp8
With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.
With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.
Any more specifics on this? I'm not sure where the right place to check is, but I tried dumping to_load in BaseModel.load_model_weights during UNETLoader, and I see:
Loading weight: blocks.0.cross_attn.k.bias, shape: torch.Size([5120]), dtype: torch.float32
Loading weight: blocks.0.cross_attn.k.weight, shape: torch.Size([5120, 5120]), dtype: torch.float8_e4m3fn
Maybe something is happening later, would be nice to have more info.
That's a WVW workflow, all of my programmatic workflows are built around Comfy's built-in WAN nodes and WVW is its own separate universe. Would be helpful to know where ComfyUI is actually forcing layers to FP8 if it is, I haven't found it yet.
And whether it matters--are these layers just set to higher precision because they're small and it can't hurt, or was it found to help? I'm a little surprised that quantized models don't always leave smaller layers alone (or maybe they do and I haven't noticed).
Anybody? It's not clear if this is an actual problem or not. A WanVideoWrapper workflow doesn't help people using the normal built-in WAN nodes.
Possibly related:
https://github.com/comfyanonymous/ComfyUI/pull/10498 "Implements tensor subclass-based mixed precision quantization, enabling per-layer FP8/BF16 quantization with automatic operation dispatch."
Not FP8 but I have been wondering about this:
"Speed up lora compute and lower memory usage by doing it in fp16." by @comfyanonymous in #11161
https://github.com/Comfy-Org/ComfyUI/pull/11161
Does that affect quality at all? Are any loras using BF16 or FP32 and therefore shouldn't be converted to FP16 on the fly?
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_scale_fp8_comfyui.json
can i contact you it's urgent being messing for weeks now i have no more time where i can contact you it's an emergency. (for fat replyability)