ComfyUI will still cast all layers to fp8

by silveroxides - opened Oct 17, 2025

Oct 17, 2025

With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.

noisefloordev

Oct 22, 2025

With fp8_scaled models that have layers at higher precision, it will still convert these to fp8 when loading by default. You will need a node using custom ops in order to prevent this. I got around it in this manner with my custom node and the fp8_scaled variant of your distill models I made.

Any more specifics on this? I'm not sure where the right place to check is, but I tried dumping to_load in BaseModel.load_model_weights during UNETLoader, and I see:

Loading weight: blocks.0.cross_attn.k.bias, shape: torch.Size([5120]), dtype: torch.float32
Loading weight: blocks.0.cross_attn.k.weight, shape: torch.Size([5120, 5120]), dtype: torch.float8_e4m3fn

Maybe something is happening later, would be nice to have more info.

lightx2v

Owner Oct 22, 2025

https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_scale_fp8_comfyui.json

noisefloordev

Oct 22, 2025

That's a WVW workflow, all of my programmatic workflows are built around Comfy's built-in WAN nodes and WVW is its own separate universe. Would be helpful to know where ComfyUI is actually forcing layers to FP8 if it is, I haven't found it yet.

And whether it matters--are these layers just set to higher precision because they're small and it can't hurt, or was it found to help? I'm a little surprised that quantized models don't always leave smaller layers alone (or maybe they do and I haven't noticed).

noisefloordev

Nov 6, 2025

Anybody? It's not clear if this is an actual problem or not. A WanVideoWrapper workflow doesn't help people using the normal built-in WAN nodes.

Possibly related:

https://github.com/comfyanonymous/ComfyUI/pull/10498 "Implements tensor subclass-based mixed precision quantization, enabling per-layer FP8/BF16 quantization with automatic operation dispatch."

usernameSRSalreadyexists

13 days ago

•

edited 13 days ago

Not FP8 but I have been wondering about this:

"Speed up lora compute and lower memory usage by doing it in fp16." by @comfyanonymous in #11161
https://github.com/Comfy-Org/ComfyUI/pull/11161

Does that affect quality at all? Are any loras using BF16 or FP32 and therefore shouldn't be converted to FP16 on the fly?

Perfs

about 7 hours ago

https://huggingface.co/lightx2v/Wan2.2-Distill-Models/blob/main/wan2.2_i2v_scale_fp8_comfyui.json

can i contact you it's urgent being messing for weeks now i have no more time where i can contact you it's an emergency. (for fat replyability)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment