Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_11636 /en /api /parallel.md

rtrm

15 days ago

preview code

download

raw

2.5 kB

	# Parallelism

	Parallelism strategies help speed up diffusion transformers by distributing computations across multiple devices, allowing for faster inference/training times. Refer to the [Distributed inferece](../training/distributed_inference) guide to learn more.

	## ParallelConfig[[diffusers.ParallelConfig]]

	#### diffusers.ParallelConfig[[diffusers.ParallelConfig]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_11636/src/diffusers/models/_modeling_parallel.py#L130)

	Configuration for applying different parallelisms.

	Parameters:

	context_parallel_config (`ContextParallelConfig`, optional) : Configuration for context parallelism.

	## ContextParallelConfig[[diffusers.ContextParallelConfig]]

	#### diffusers.ContextParallelConfig[[diffusers.ContextParallelConfig]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_11636/src/diffusers/models/_modeling_parallel.py#L41)

	Configuration for context parallelism.

	Parameters:

	ring_degree (`int`, optional, defaults to `1`) : Number of devices to use for Ring Attention. Sequence is split across devices. Each device computes attention between its local Q and KV chunks passed sequentially around ring. Lower memory (only holds 1/N of KV at a time), overlaps compute with communication, but requires N iterations to see all tokens. Best for long sequences with limited memory/bandwidth. Number of devices to use for ring attention within a context parallel region. Must be a divisor of the total number of devices in the context parallel mesh.

	ulysses_degree (`int`, optional, defaults to `1`) : Number of devices to use for Ulysses Attention. Sequence split is across devices. Each device computes local QKV, then all-gathers all KV chunks to compute full attention in one pass. Higher memory (stores all KV), requires high-bandwidth all-to-all communication, but lower latency. Best for moderate sequences with good interconnect bandwidth.

	convert_to_fp32 (`bool`, optional, defaults to `True`) : Whether to convert output and LSE to float32 for ring attention numerical stability.

	rotate_method (`str`, optional, defaults to `"allgather"`) : Method to use for rotating key/value states across devices in ring attention. Currently, only `"allgather"` is supported.

	#### diffusers.hooks.apply_context_parallel[[diffusers.hooks.apply_context_parallel]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_11636/src/diffusers/hooks/context_parallel.py#L78)

	Apply context parallel on a model.

Xet Storage Details

Size:: 2.5 kB
Xet hash:: 85d2c070b3b97a363570595efb16441e0439a20c4e4a9aba3ca6a48deaedbbe1

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.