Buckets:
| # Quantization | |
| Quantization techniques reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). This enables loading larger models you normally wouldn't be able to fit into memory, and speeding up inference. | |
| > [!TIP] | |
| > Learn how to quantize models in the [Quantization](../quantization/overview) guide. | |
| ## PipelineQuantizationConfig[[diffusers.PipelineQuantizationConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.PipelineQuantizationConfig</name><anchor>diffusers.PipelineQuantizationConfig</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/pipe_quant_config.py#L33</source><parameters>[{"name": "quant_backend", "val": ": str = None"}, {"name": "quant_kwargs", "val": ": typing.Dict[str, typing.Union[str, float, int, dict]] = None"}, {"name": "components_to_quantize", "val": ": typing.Union[typing.List[str], str, NoneType] = None"}, {"name": "quant_mapping", "val": ": typing.Dict[str, typing.Union[diffusers.quantizers.quantization_config.QuantizationConfigMixin, ForwardRef('TransformersQuantConfigMixin')]] = None"}]</parameters><paramsdesc>- **quant_backend** (`str`) -- Quantization backend to be used. When using this option, we assume that the backend | |
| is available to both `diffusers` and `transformers`. | |
| - **quant_kwargs** (`dict`) -- Params to initialize the quantization backend class. | |
| - **components_to_quantize** (`list`) -- Components of a pipeline to be quantized. | |
| - **quant_mapping** (`dict`) -- Mapping defining the quantization specs to be used for the pipeline | |
| components. When using this argument, users are not expected to provide `quant_backend`, `quant_kawargs`, | |
| and `components_to_quantize`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class to be used when applying quantization on-the-fly to [from_pretrained()](/docs/diffusers/pr_12595/en/api/pipelines/overview#diffusers.DiffusionPipeline.from_pretrained). | |
| </div> | |
| ## BitsAndBytesConfig[[diffusers.BitsAndBytesConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.BitsAndBytesConfig</name><anchor>diffusers.BitsAndBytesConfig</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L180</source><parameters>[{"name": "load_in_8bit", "val": " = False"}, {"name": "load_in_4bit", "val": " = False"}, {"name": "llm_int8_threshold", "val": " = 6.0"}, {"name": "llm_int8_skip_modules", "val": " = None"}, {"name": "llm_int8_enable_fp32_cpu_offload", "val": " = False"}, {"name": "llm_int8_has_fp16_weight", "val": " = False"}, {"name": "bnb_4bit_compute_dtype", "val": " = None"}, {"name": "bnb_4bit_quant_type", "val": " = 'fp4'"}, {"name": "bnb_4bit_use_double_quant", "val": " = False"}, {"name": "bnb_4bit_quant_storage", "val": " = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **load_in_8bit** (`bool`, *optional*, defaults to `False`) -- | |
| This flag is used to enable 8-bit quantization with LLM.int8(). | |
| - **load_in_4bit** (`bool`, *optional*, defaults to `False`) -- | |
| This flag is used to enable 4-bit quantization by replacing the Linear layers with FP4/NF4 layers from | |
| `bitsandbytes`. | |
| - **llm_int8_threshold** (`float`, *optional*, defaults to 6.0) -- | |
| This corresponds to the outlier threshold for outlier detection as described in `LLM.int8() : 8-bit Matrix | |
| Multiplication for Transformers at Scale` paper: https://huggingface.co/papers/2208.07339 Any hidden states | |
| value that is above this threshold will be considered an outlier and the operation on those values will be | |
| done in fp16. Values are usually normally distributed, that is, most values are in the range [-3.5, 3.5], | |
| but there are some exceptional systematic outliers that are very differently distributed for large models. | |
| These outliers are often in the interval [-60, -6] or [6, 60]. Int8 quantization works well for values of | |
| magnitude ~5, but beyond that, there is a significant performance penalty. A good default threshold is 6, | |
| but a lower threshold might be needed for more unstable models (small models, fine-tuning). | |
| - **llm_int8_skip_modules** (`List[str]`, *optional*) -- | |
| An explicit list of the modules that we do not want to convert in 8-bit. This is useful for models such as | |
| Jukebox that has several heads in different places and not necessarily at the last position. For example | |
| for `CausalLM` models, the last `lm_head` is typically kept in its original `dtype`. | |
| - **llm_int8_enable_fp32_cpu_offload** (`bool`, *optional*, defaults to `False`) -- | |
| This flag is used for advanced use cases and users that are aware of this feature. If you want to split | |
| your model in different parts and run some parts in int8 on GPU and some parts in fp32 on CPU, you can use | |
| this flag. This is useful for offloading large models such as `google/flan-t5-xxl`. Note that the int8 | |
| operations will not be run on CPU. | |
| - **llm_int8_has_fp16_weight** (`bool`, *optional*, defaults to `False`) -- | |
| This flag runs LLM.int8() with 16-bit main weights. This is useful for fine-tuning as the weights do not | |
| have to be converted back and forth for the backward pass. | |
| - **bnb_4bit_compute_dtype** (`torch.dtype` or str, *optional*, defaults to `torch.float32`) -- | |
| This sets the computational type which might be different than the input type. For example, inputs might be | |
| fp32, but computation can be set to bf16 for speedups. | |
| - **bnb_4bit_quant_type** (`str`, *optional*, defaults to `"fp4"`) -- | |
| This sets the quantization data type in the bnb.nn.Linear4Bit layers. Options are FP4 and NF4 data types | |
| which are specified by `fp4` or `nf4`. | |
| - **bnb_4bit_use_double_quant** (`bool`, *optional*, defaults to `False`) -- | |
| This flag is used for nested quantization where the quantization constants from the first quantization are | |
| quantized again. | |
| - **bnb_4bit_quant_storage** (`torch.dtype` or str, *optional*, defaults to `torch.uint8`) -- | |
| This sets the storage type to pack the quanitzed 4-bit prarams. | |
| - **kwargs** (`Dict[str, Any]`, *optional*) -- | |
| Additional parameters from which to initialize the configuration object.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is a wrapper class about all possible attributes and features that you can play with a model that has been | |
| loaded using `bitsandbytes`. | |
| This replaces `load_in_8bit` or `load_in_4bit` therefore both options are mutually exclusive. | |
| Currently only supports `LLM.int8()`, `FP4`, and `NF4` quantization. If more methods are added to `bitsandbytes`, | |
| then more arguments will be added to this class. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>is_quantizable</name><anchor>diffusers.BitsAndBytesConfig.is_quantizable</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L359</source><parameters>[]</parameters></docstring> | |
| Returns `True` if the model is quantizable, `False` otherwise. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>post_init</name><anchor>diffusers.BitsAndBytesConfig.post_init</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L322</source><parameters>[]</parameters></docstring> | |
| Safety checker that arguments are correct - also replaces some NoneType arguments with their default values. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>quantization_method</name><anchor>diffusers.BitsAndBytesConfig.quantization_method</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L365</source><parameters>[]</parameters></docstring> | |
| This method returns the quantization method used for the model. If the model is not quantizable, it returns | |
| `None`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>to_diff_dict</name><anchor>diffusers.BitsAndBytesConfig.to_diff_dict</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L396</source><parameters>[]</parameters><rettype>`Dict[str, Any]`</rettype><retdesc>Dictionary of all the attributes that make up this configuration instance,</retdesc></docstring> | |
| Removes all attributes from config which correspond to the default config attributes for better readability and | |
| serializes to a Python dictionary. | |
| </div></div> | |
| ## GGUFQuantizationConfig[[diffusers.GGUFQuantizationConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.GGUFQuantizationConfig</name><anchor>diffusers.GGUFQuantizationConfig</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L420</source><parameters>[{"name": "compute_dtype", "val": ": typing.Optional[ForwardRef('torch.dtype')] = None"}]</parameters><paramsdesc>- **compute_dtype** -- (`torch.dtype`, defaults to `torch.float32`): | |
| This sets the computational type which might be different than the input type. For example, inputs might be | |
| fp32, but computation can be set to bf16 for speedups.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is a config class for GGUF Quantization techniques. | |
| </div> | |
| ## QuantoConfig[[diffusers.QuantoConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.QuantoConfig</name><anchor>diffusers.QuantoConfig</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L816</source><parameters>[{"name": "weights_dtype", "val": ": str = 'int8'"}, {"name": "modules_to_not_convert", "val": ": typing.Optional[typing.List[str]] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **weights_dtype** (`str`, *optional*, defaults to `"int8"`) -- | |
| The target dtype for the weights after quantization. Supported values are ("float8","int8","int4","int2")</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is a wrapper class about all possible attributes and features that you can play with a model that has been | |
| loaded using `quanto`. | |
| modules_to_not_convert (`list`, *optional*, default to `None`): | |
| The list of modules to not quantize, useful for quantizing models that explicitly require to have some | |
| modules left in their original precision (e.g. Whisper encoder, Llava encoder, Mixtral gate layers). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>post_init</name><anchor>diffusers.QuantoConfig.post_init</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L841</source><parameters>[]</parameters></docstring> | |
| Safety checker that arguments are correct | |
| </div></div> | |
| ## TorchAoConfig[[diffusers.TorchAoConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.TorchAoConfig</name><anchor>diffusers.TorchAoConfig</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L443</source><parameters>[{"name": "quant_type", "val": ": typing.Union[str, ForwardRef('AOBaseConfig')]"}, {"name": "modules_to_not_convert", "val": ": typing.Optional[typing.List[str]] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **quant_type** (Union[`str`, AOBaseConfig]) -- | |
| The type of quantization we want to use, currently supporting: | |
| - **Integer quantization:** | |
| - Full function names: `int4_weight_only`, `int8_dynamic_activation_int4_weight`, | |
| `int8_weight_only`, `int8_dynamic_activation_int8_weight` | |
| - Shorthands: `int4wo`, `int4dq`, `int8wo`, `int8dq` | |
| - **Floating point 8-bit quantization:** | |
| - Full function names: `float8_weight_only`, `float8_dynamic_activation_float8_weight`, | |
| `float8_static_activation_float8_weight` | |
| - Shorthands: `float8wo`, `float8wo_e5m2`, `float8wo_e4m3`, `float8dq`, `float8dq_e4m3`, | |
| `float8_e4m3_tensor`, `float8_e4m3_row`, | |
| - **Floating point X-bit quantization:** | |
| - Full function names: `fpx_weight_only` | |
| - Shorthands: `fpX_eAwB`, where `X` is the number of bits (between `1` to `7`), `A` is the number | |
| of exponent bits and `B` is the number of mantissa bits. The constraint of `X == A + B + 1` must | |
| be satisfied for a given shorthand notation. | |
| - **Unsigned Integer quantization:** | |
| - Full function names: `uintx_weight_only` | |
| - Shorthands: `uint1wo`, `uint2wo`, `uint3wo`, `uint4wo`, `uint5wo`, `uint6wo`, `uint7wo` | |
| - An AOBaseConfig instance: for more advanced configuration options. | |
| - **modules_to_not_convert** (`List[str]`, *optional*, default to `None`) -- | |
| The list of modules to not quantize, useful for quantizing models that explicitly require to have some | |
| modules left in their original precision. | |
| - **kwargs** (`Dict[str, Any]`, *optional*) -- | |
| The keyword arguments for the chosen type of quantization, for example, int4_weight_only quantization | |
| supports two keyword arguments `group_size` and `inner_k_tiles` currently. More API examples and | |
| documentation of arguments can be found in | |
| https://github.com/pytorch/ao/tree/main/torchao/quantization#other-available-quantization-techniques</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is a config class for torchao quantization/sparsity techniques. | |
| <ExampleCodeBlock anchor="diffusers.TorchAoConfig.example"> | |
| Example: | |
| ```python | |
| from diffusers import FluxTransformer2DModel, TorchAoConfig | |
| # AOBaseConfig-based configuration | |
| from torchao.quantization import Int8WeightOnlyConfig | |
| quantization_config = TorchAoConfig(Int8WeightOnlyConfig()) | |
| # String-based config | |
| quantization_config = TorchAoConfig("int8wo") | |
| transformer = FluxTransformer2DModel.from_pretrained( | |
| "black-forest-labs/Flux.1-Dev", | |
| subfolder="transformer", | |
| quantization_config=quantization_config, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| ``` | |
| </ExampleCodeBlock> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>from_dict</name><anchor>diffusers.TorchAoConfig.from_dict</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L589</source><parameters>[{"name": "config_dict", "val": ""}, {"name": "return_unused_kwargs", "val": " = False"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Create configuration from a dictionary. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_apply_tensor_subclass</name><anchor>diffusers.TorchAoConfig.get_apply_tensor_subclass</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L760</source><parameters>[]</parameters></docstring> | |
| Create the appropriate quantization method based on configuration. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>to_dict</name><anchor>diffusers.TorchAoConfig.to_dict</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/quantization_config.py#L561</source><parameters>[]</parameters></docstring> | |
| Convert configuration to a dictionary. | |
| </div></div> | |
| ## DiffusersQuantizer[[diffusers.DiffusersQuantizer]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.DiffusersQuantizer</name><anchor>diffusers.DiffusersQuantizer</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L34</source><parameters>[{"name": "quantization_config", "val": ": QuantizationConfigMixin"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Abstract class of the HuggingFace quantizer. Supports for now quantizing HF diffusers models for inference and/or | |
| quantization. This class is used only for diffusers.models.modeling_utils.ModelMixin.from_pretrained and cannot be | |
| easily used outside the scope of that method yet. | |
| Attributes | |
| quantization_config (`diffusers.quantizers.quantization_config.QuantizationConfigMixin`): | |
| The quantization config that defines the quantization parameters of your model that you want to quantize. | |
| modules_to_not_convert (`List[str]`, *optional*): | |
| The list of module names to not convert when quantizing the model. | |
| required_packages (`List[str]`, *optional*): | |
| The list of required pip packages to install prior to using the quantizer | |
| requires_calibration (`bool`): | |
| Whether the quantization method requires to calibrate the model before using it. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>adjust_max_memory</name><anchor>diffusers.DiffusersQuantizer.adjust_max_memory</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L133</source><parameters>[{"name": "max_memory", "val": ": typing.Dict[str, typing.Union[int, str]]"}]</parameters></docstring> | |
| adjust max_memory argument for infer_auto_device_map() if extra memory is needed for quantization | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>adjust_target_dtype</name><anchor>diffusers.DiffusersQuantizer.adjust_target_dtype</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L91</source><parameters>[{"name": "torch_dtype", "val": ": torch.dtype"}]</parameters><paramsdesc>- **torch_dtype** (`torch.dtype`, *optional*) -- | |
| The torch_dtype that is used to compute the device_map.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Override this method if you want to adjust the `target_dtype` variable used in `from_pretrained` to compute the | |
| device_map in case the device_map is a `str`. E.g. for bitsandbytes we force-set `target_dtype` to `torch.int8` | |
| and for 4-bit we pass a custom enum `accelerate.CustomDtype.int4`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>check_if_quantized_param</name><anchor>diffusers.DiffusersQuantizer.check_if_quantized_param</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L137</source><parameters>[{"name": "model", "val": ": ModelMixin"}, {"name": "param_value", "val": ": torch.Tensor"}, {"name": "param_name", "val": ": str"}, {"name": "state_dict", "val": ": typing.Dict[str, typing.Any]"}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| checks if a loaded state_dict component is part of quantized param + some validation; only defined for | |
| quantization methods that require to create a new parameters for quantization. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>check_quantized_param_shape</name><anchor>diffusers.DiffusersQuantizer.check_quantized_param_shape</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L157</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| checks if the quantized param has expected shape. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>create_quantized_param</name><anchor>diffusers.DiffusersQuantizer.create_quantized_param</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L151</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| takes needed components from state_dict and creates quantized param. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>dequantize</name><anchor>diffusers.DiffusersQuantizer.dequantize</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L200</source><parameters>[{"name": "model", "val": ""}]</parameters></docstring> | |
| Potentially dequantize the model to retrieve the original model, with some loss in accuracy / performance. Note | |
| not all quantization schemes support this. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_cuda_warm_up_factor</name><anchor>diffusers.DiffusersQuantizer.get_cuda_warm_up_factor</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L212</source><parameters>[]</parameters></docstring> | |
| The factor to be used in `caching_allocator_warmup` to get the number of bytes to pre-allocate to warm up cuda. | |
| A factor of 2 means we allocate all bytes in the empty model (since we allocate in fp16), a factor of 4 means | |
| we allocate half the memory of the weights residing in the empty model, etc... | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_special_dtypes_update</name><anchor>diffusers.DiffusersQuantizer.get_special_dtypes_update</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L113</source><parameters>[{"name": "model", "val": ""}, {"name": "torch_dtype", "val": ": torch.dtype"}]</parameters><paramsdesc>- **model** (`~diffusers.models.modeling_utils.ModelMixin`) -- | |
| The model to quantize | |
| - **torch_dtype** (`torch.dtype`) -- | |
| The dtype passed in `from_pretrained` method.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| returns dtypes for modules that are not quantized - used for the computation of the device_map in case one | |
| passes a str as a device_map. The method will use the `modules_to_not_convert` that is modified in | |
| `_process_model_before_weight_loading`. `diffusers` models don't have any `modules_to_not_convert` attributes | |
| yet but this can change soon in the future. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>postprocess_model</name><anchor>diffusers.DiffusersQuantizer.postprocess_model</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L187</source><parameters>[{"name": "model", "val": ": ModelMixin"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **model** (`~diffusers.models.modeling_utils.ModelMixin`) -- | |
| The model to quantize | |
| - **kwargs** (`dict`, *optional*) -- | |
| The keyword arguments that are passed along `_process_model_after_weight_loading`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Post-process the model post weights loading. Make sure to override the abstract method | |
| `_process_model_after_weight_loading`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>preprocess_model</name><anchor>diffusers.DiffusersQuantizer.preprocess_model</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L171</source><parameters>[{"name": "model", "val": ": ModelMixin"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **model** (`~diffusers.models.modeling_utils.ModelMixin`) -- | |
| The model to quantize | |
| - **kwargs** (`dict`, *optional*) -- | |
| The keyword arguments that are passed along `_process_model_before_weight_loading`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Setting model attributes and/or converting model before weights loading. At this point the model should be | |
| initialized on the meta device so you can freely manipulate the skeleton of the model in order to replace | |
| modules in-place. Make sure to override the abstract method `_process_model_before_weight_loading`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>update_device_map</name><anchor>diffusers.DiffusersQuantizer.update_device_map</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L79</source><parameters>[{"name": "device_map", "val": ": typing.Optional[typing.Dict[str, typing.Any]]"}]</parameters><paramsdesc>- **device_map** (`Union[dict, str]`, *optional*) -- | |
| The device_map that is passed through the `from_pretrained` method.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Override this method if you want to pass a override the existing device map with a new one. E.g. for | |
| bitsandbytes, since `accelerate` is a hard requirement, if no device_map is passed, the device_map is set to | |
| `"auto"`` | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>update_missing_keys</name><anchor>diffusers.DiffusersQuantizer.update_missing_keys</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L103</source><parameters>[{"name": "model", "val": ""}, {"name": "missing_keys", "val": ": typing.List[str]"}, {"name": "prefix", "val": ": str"}]</parameters><paramsdesc>- **missing_keys** (`List[str]`, *optional*) -- | |
| The list of missing keys in the checkpoint compared to the state dict of the model</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Override this method if you want to adjust the `missing_keys`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>update_torch_dtype</name><anchor>diffusers.DiffusersQuantizer.update_torch_dtype</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L68</source><parameters>[{"name": "torch_dtype", "val": ": torch.dtype"}]</parameters><paramsdesc>- **torch_dtype** (`torch.dtype`) -- | |
| The input dtype that is passed in `from_pretrained`</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Some quantization methods require to explicitly set the dtype of the model to a target dtype. You need to | |
| override this method in case you want to make sure that behavior is preserved | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>validate_environment</name><anchor>diffusers.DiffusersQuantizer.validate_environment</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/quantizers/base.py#L163</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| This method is used to potentially check for potential conflicts with arguments that are passed in | |
| `from_pretrained`. You need to define it for all future quantizers that are integrated with diffusers. If no | |
| explicit check are needed, simply return nothing. | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/quantization.md" /> |
Xet Storage Details
- Size:
- 27.7 kB
- Xet hash:
- d7527e4418400eba22a2773caed283bd5f35a4f845d0cef2ceaf564366412473
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.