Buckets:

hf-doc-build/doc-dev / diffusers /pr_12411 /en /stable_diffusion.md
|
download
raw
5.74 kB
# Basic performance
Diffusion is a random process that is computationally demanding. You may need to run the [DiffusionPipeline](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline) several times before getting a desired output. That's why it's important to carefully balance generation speed and memory usage in order to iterate faster,
This guide recommends some basic performance tips for using the [DiffusionPipeline](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline). Refer to the Inference Optimization section docs such as [Accelerate inference](./optimization/fp16) or [Reduce memory usage](./optimization/memory) for more detailed performance guides.
## Memory usage
Reducing the amount of memory used indirectly speeds up generation and can help a model fit on device.
The [enable_model_cpu_offload()](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline.enable_model_cpu_offload) method moves a model to the CPU when it is not in use to save GPU memory.
```py
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
pipeline.enable_model_cpu_offload()
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
pipeline(prompt).images[0]
print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
```
## Inference speed
Denoising is the most computationally demanding process during diffusion. Methods that optimizes this process accelerates inference speed. Try the following methods for a speed up.
- Add `device_map="cuda"` to place the pipeline on a GPU. Placing a model on an accelerator, like a GPU, increases speed because it performs computations in parallel.
- Set `torch_dtype=torch.bfloat16` to execute the pipeline in half-precision. Reducing the data type precision increases speed because it takes less time to perform computations in a lower precision.
```py
import torch
import time
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.bfloat16,
device_map="cuda
)
```
- Use a faster scheduler, such as [DPMSolverMultistepScheduler](/docs/diffusers/pr_12411/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler), which only requires ~20-25 steps.
- Set `num_inference_steps` to a lower value. Reducing the number of inference steps reduces the overall number of computations. However, this can result in lower generation quality.
```py
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
start_time = time.perf_counter()
image = pipeline(prompt).images[0]
end_time = time.perf_counter()
print(f"Image generation took {end_time - start_time:.3f} seconds")
```
## Generation quality
Many modern diffusion models deliver high-quality images out-of-the-box. However, you can still improve generation quality by trying the following.
- Try a more detailed and descriptive prompt. Include details such as the image medium, subject, style, and aesthetic. A negative prompt may also help by guiding a model away from undesirable features by using words like low quality or blurry.
```py
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
negative_prompt = "low quality, blurry, ugly, poor details"
pipeline(prompt, negative_prompt=negative_prompt).images[0]
```
For more details about creating better prompts, take a look at the [Prompt techniques](./using-diffusers/weighted_prompts) doc.
- Try a different scheduler, like [HeunDiscreteScheduler](/docs/diffusers/pr_12411/en/api/schedulers/heun#diffusers.HeunDiscreteScheduler) or [LMSDiscreteScheduler](/docs/diffusers/pr_12411/en/api/schedulers/lms_discrete#diffusers.LMSDiscreteScheduler), that gives up generation speed for quality.
```py
import torch
from diffusers import DiffusionPipeline, HeunDiscreteScheduler
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.bfloat16,
device_map="cuda"
)
pipeline.scheduler = HeunDiscreteScheduler.from_config(pipeline.scheduler.config)
prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""
negative_prompt = "low quality, blurry, ugly, poor details"
pipeline(prompt, negative_prompt=negative_prompt).images[0]
```
## Next steps
Diffusers offers more advanced and powerful optimizations such as [group-offloading](./optimization/memory#group-offloading) and [regional compilation](./optimization/fp16#regional-compilation). To learn more about how to maximize performance, take a look at the Inference Optimization section.

Xet Storage Details

Size:
5.74 kB
·
Xet hash:
167996ad740630a3b46a887d78d693d00d6e0c0248200850830a41bfa745b089

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.