Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_11739 /en /optimization /tome.md

rtrm

19 days ago

preview code

download

raw

6.33 kB

	# Token merging

	[Token merging](https://huggingface.co/papers/2303.17604) (ToMe) merges redundant tokens/patches progressively in the forward pass of a Transformer-based network which can speed-up the inference latency of [StableDiffusionPipeline](/docs/diffusers/pr_11739/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline).

	Install ToMe from `pip`:

	```bash
	pip install tomesd
	```

	You can use ToMe from the [`tomesd`](https://github.com/dbolya/tomesd) library with the [`apply_patch`](https://github.com/dbolya/tomesd?tab=readme-ov-file#usage) function:

	```diff
	from diffusers import StableDiffusionPipeline
	import torch
	import tomesd

	pipeline = StableDiffusionPipeline.from_pretrained(
	"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True,
	).to("cuda")
	+ tomesd.apply_patch(pipeline, ratio=0.5)

	image = pipeline("a photo of an astronaut riding a horse on mars").images[0]
	```

	The `apply_patch` function exposes a number of [arguments](https://github.com/dbolya/tomesd#usage) to help strike a balance between pipeline inference speed and the quality of the generated tokens. The most important argument is `ratio` which controls the number of tokens that are merged during the forward pass.

	As reported in the [paper](https://huggingface.co/papers/2303.17604), ToMe can greatly preserve the quality of the generated images while boosting inference speed. By increasing the `ratio`, you can speed-up inference even further, but at the cost of some degraded image quality.

	To test the quality of the generated images, we sampled a few prompts from [Parti Prompts](https://parti.research.google/) and performed inference with the [StableDiffusionPipeline](/docs/diffusers/pr_11739/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline) with the following settings:



	We didn’t notice any significant decrease in the quality of the generated samples, and you can check out the generated samples in this [WandB report](https://wandb.ai/sayakpaul/tomesd-results/runs/23j4bj3i?workspace=). If you're interested in reproducing this experiment, use this [script](https://gist.github.com/sayakpaul/8cac98d7f22399085a060992f411ecbd).

	## Benchmarks

	We also benchmarked the impact of `tomesd` on the [StableDiffusionPipeline](/docs/diffusers/pr_11739/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline) with [xFormers](https://huggingface.co/docs/diffusers/optimization/xformers) enabled across several image resolutions. The results are obtained from A100 and V100 GPUs in the following development environment:

	```bash
	- `diffusers` version: 0.15.1
	- Python version: 3.8.16
	- PyTorch version (GPU?): 1.13.1+cu116 (True)
	- Huggingface_hub version: 0.13.2
	- Transformers version: 4.27.2
	- Accelerate version: 0.18.0
	- xFormers version: 0.0.16
	- tomesd version: 0.1.2
	```

	To reproduce this benchmark, feel free to use this [script](https://gist.github.com/sayakpaul/27aec6bca7eb7b0e0aa4112205850335). The results are reported in seconds, and where applicable we report the speed-up percentage over the vanilla pipeline when using ToMe and ToMe + xFormers.

	\| GPU \| Resolution \| Batch size \| Vanilla \| ToMe \| ToMe + xFormers \|
	\|----------\|----------------\|----------------\|-------------\|----------------\|---------------------\|
	\| A100 \| 512 \| 10 \| 6.88 \| 5.26 (+23.55%) \| 4.69 (+31.83%) \|
	\| \| 768 \| 10 \| OOM \| 14.71 \| 11 \|
	\| \| \| 8 \| OOM \| 11.56 \| 8.84 \|
	\| \| \| 4 \| OOM \| 5.98 \| 4.66 \|
	\| \| \| 2 \| 4.99 \| 3.24 (+35.07%) \| 2.1 (+37.88%) \|
	\| \| \| 1 \| 3.29 \| 2.24 (+31.91%) \| 2.03 (+38.3%) \|
	\| \| 1024 \| 10 \| OOM \| OOM \| OOM \|
	\| \| \| 8 \| OOM \| OOM \| OOM \|
	\| \| \| 4 \| OOM \| 12.51 \| 9.09 \|
	\| \| \| 2 \| OOM \| 6.52 \| 4.96 \|
	\| \| \| 1 \| 6.4 \| 3.61 (+43.59%) \| 2.81 (+56.09%) \|
	\| V100 \| 512 \| 10 \| OOM \| 10.03 \| 9.29 \|
	\| \| \| 8 \| OOM \| 8.05 \| 7.47 \|
	\| \| \| 4 \| 5.7 \| 4.3 (+24.56%) \| 3.98 (+30.18%) \|
	\| \| \| 2 \| 3.14 \| 2.43 (+22.61%) \| 2.27 (+27.71%) \|
	\| \| \| 1 \| 1.88 \| 1.57 (+16.49%) \| 1.57 (+16.49%) \|
	\| \| 768 \| 10 \| OOM \| OOM \| 23.67 \|
	\| \| \| 8 \| OOM \| OOM \| 18.81 \|
	\| \| \| 4 \| OOM \| 11.81 \| 9.7 \|
	\| \| \| 2 \| OOM \| 6.27 \| 5.2 \|
	\| \| \| 1 \| 5.43 \| 3.38 (+37.75%) \| 2.82 (+48.07%) \|
	\| \| 1024 \| 10 \| OOM \| OOM \| OOM \|
	\| \| \| 8 \| OOM \| OOM \| OOM \|
	\| \| \| 4 \| OOM \| OOM \| 19.35 \|
	\| \| \| 2 \| OOM \| 13 \| 10.78 \|
	\| \| \| 1 \| OOM \| 6.66 \| 5.54 \|

	As seen in the tables above, the speed-up from `tomesd` becomes more pronounced for larger image resolutions. It is also interesting to note that with `tomesd`, it is possible to run the pipeline on a higher resolution like 1024x1024. You may be able to speed-up inference even more with [`torch.compile`](fp16#torchcompile).

Xet Storage Details

Size:: 6.33 kB
Xet hash:: 48ca4e5ff59821109970881b0fe1838ae1a64b1fbd36e1523254d00f9d96316a

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.