Buckets:

hf-doc-build
/

doc-dev

hf-doc-build/doc-dev / diffusers /pr_12411 /en /stable_diffusion.md

5.74 kB

	# Basic performance

	Diffusion is a random process that is computationally demanding. You may need to run the [DiffusionPipeline](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline) several times before getting a desired output. That's why it's important to carefully balance generation speed and memory usage in order to iterate faster,

	This guide recommends some basic performance tips for using the [DiffusionPipeline](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline). Refer to the Inference Optimization section docs such as [Accelerate inference](./optimization/fp16) or [Reduce memory usage](./optimization/memory) for more detailed performance guides.

	## Memory usage

	Reducing the amount of memory used indirectly speeds up generation and can help a model fit on device.

	The [enable_model_cpu_offload()](/docs/diffusers/pr_12411/en/api/pipelines/overview#diffusers.DiffusionPipeline.enable_model_cpu_offload) method moves a model to the CPU when it is not in use to save GPU memory.

	```py
	import torch
	from diffusers import DiffusionPipeline

	pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.bfloat16,
	device_map="cuda"
	)
	pipeline.enable_model_cpu_offload()

	prompt = """
	cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
	highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
	"""
	pipeline(prompt).images[0]
	print(f"Max memory reserved: {torch.cuda.max_memory_allocated() / 1024**3:.2f} GB")
	```

	## Inference speed

	Denoising is the most computationally demanding process during diffusion. Methods that optimizes this process accelerates inference speed. Try the following methods for a speed up.

	- Add `device_map="cuda"` to place the pipeline on a GPU. Placing a model on an accelerator, like a GPU, increases speed because it performs computations in parallel.
	- Set `torch_dtype=torch.bfloat16` to execute the pipeline in half-precision. Reducing the data type precision increases speed because it takes less time to perform computations in a lower precision.

	```py
	import torch
	import time
	from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler

	pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.bfloat16,
	device_map="cuda
	)
	```

	- Use a faster scheduler, such as [DPMSolverMultistepScheduler](/docs/diffusers/pr_12411/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler), which only requires ~20-25 steps.
	- Set `num_inference_steps` to a lower value. Reducing the number of inference steps reduces the overall number of computations. However, this can result in lower generation quality.

	```py
	pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

	prompt = """
	cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
	highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
	"""

	start_time = time.perf_counter()
	image = pipeline(prompt).images[0]
	end_time = time.perf_counter()

	print(f"Image generation took {end_time - start_time:.3f} seconds")
	```

	## Generation quality

	Many modern diffusion models deliver high-quality images out-of-the-box. However, you can still improve generation quality by trying the following.

	- Try a more detailed and descriptive prompt. Include details such as the image medium, subject, style, and aesthetic. A negative prompt may also help by guiding a model away from undesirable features by using words like low quality or blurry.

	```py
	import torch
	from diffusers import DiffusionPipeline

	pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.bfloat16,
	device_map="cuda"
	)

	prompt = """
	cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
	highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
	"""
	negative_prompt = "low quality, blurry, ugly, poor details"
	pipeline(prompt, negative_prompt=negative_prompt).images[0]
	```

	For more details about creating better prompts, take a look at the [Prompt techniques](./using-diffusers/weighted_prompts) doc.

	- Try a different scheduler, like [HeunDiscreteScheduler](/docs/diffusers/pr_12411/en/api/schedulers/heun#diffusers.HeunDiscreteScheduler) or [LMSDiscreteScheduler](/docs/diffusers/pr_12411/en/api/schedulers/lms_discrete#diffusers.LMSDiscreteScheduler), that gives up generation speed for quality.

	```py
	import torch
	from diffusers import DiffusionPipeline, HeunDiscreteScheduler

	pipeline = DiffusionPipeline.from_pretrained(
	"stabilityai/stable-diffusion-xl-base-1.0",
	torch_dtype=torch.bfloat16,
	device_map="cuda"
	)
	pipeline.scheduler = HeunDiscreteScheduler.from_config(pipeline.scheduler.config)

	prompt = """
	cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
	highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
	"""
	negative_prompt = "low quality, blurry, ugly, poor details"
	pipeline(prompt, negative_prompt=negative_prompt).images[0]
	```

	## Next steps

	Diffusers offers more advanced and powerful optimizations such as [group-offloading](./optimization/memory#group-offloading) and [regional compilation](./optimization/fp16#regional-compilation). To learn more about how to maximize performance, take a look at the Inference Optimization section.

Xet Storage Details

Size:: 5.74 kB
Xet hash:: 167996ad740630a3b46a887d78d693d00d6e0c0248200850830a41bfa745b089

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.