Instructions to use lsmpp/kontextrefiner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use lsmpp/kontextrefiner with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("lsmpp/kontextrefiner", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
GGUF
The GGUF file format is typically used to store models for inference with GGML and supports a variety of block wise quantization options. Diffusers supports loading checkpoints prequantized and saved in the GGUF format via from_single_file loading with Model classes. Loading GGUF checkpoints via Pipelines is currently not supported.
The following example will load the FLUX.1 DEV transformer model using the GGUF Q2_K quantization variant.
Before starting please install gguf in your environment
pip install -U gguf
Since GGUF is a single file format, use [~FromSingleFileMixin.from_single_file] to load the model and pass in the [GGUFQuantizationConfig].
When using GGUF checkpoints, the quantized weights remain in a low memory dtype(typically torch.uint8) and are dynamically dequantized and cast to the configured compute_dtype during each module's forward pass through the model. The GGUFQuantizationConfig allows you to set the compute_dtype.
The functions used for dynamic dequantizatation are based on the great work done by city96, who created the Pytorch ports of the original numpy implementation by compilade.
import torch
from diffusers import FluxPipeline, FluxTransformer2DModel, GGUFQuantizationConfig
ckpt_path = (
"https://huggingface.co/city96/FLUX.1-dev-gguf/blob/main/flux1-dev-Q2_K.gguf"
)
transformer = FluxTransformer2DModel.from_single_file(
ckpt_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
)
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
transformer=transformer,
torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()
prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, generator=torch.manual_seed(0)).images[0]
image.save("flux-gguf.png")
Supported Quantization Types
- BF16
- Q4_0
- Q4_1
- Q5_0
- Q5_1
- Q8_0
- Q2_K
- Q3_K
- Q4_K
- Q5_K
- Q6_K