Buckets:

|
download
raw
2.5 kB

Parallelism

Parallelism strategies help speed up diffusion transformers by distributing computations across multiple devices, allowing for faster inference/training times. Refer to the Distributed inferece guide to learn more.

ParallelConfig[[diffusers.ParallelConfig]]

diffusers.ParallelConfig[[diffusers.ParallelConfig]]

Source

Configuration for applying different parallelisms.

Parameters:

context_parallel_config (ContextParallelConfig, optional) : Configuration for context parallelism.

ContextParallelConfig[[diffusers.ContextParallelConfig]]

diffusers.ContextParallelConfig[[diffusers.ContextParallelConfig]]

Source

Configuration for context parallelism.

Parameters:

ring_degree (int, optional, defaults to 1) : Number of devices to use for Ring Attention. Sequence is split across devices. Each device computes attention between its local Q and KV chunks passed sequentially around ring. Lower memory (only holds 1/N of KV at a time), overlaps compute with communication, but requires N iterations to see all tokens. Best for long sequences with limited memory/bandwidth. Number of devices to use for ring attention within a context parallel region. Must be a divisor of the total number of devices in the context parallel mesh.

ulysses_degree (int, optional, defaults to 1) : Number of devices to use for Ulysses Attention. Sequence split is across devices. Each device computes local QKV, then all-gathers all KV chunks to compute full attention in one pass. Higher memory (stores all KV), requires high-bandwidth all-to-all communication, but lower latency. Best for moderate sequences with good interconnect bandwidth.

convert_to_fp32 (bool, optional, defaults to True) : Whether to convert output and LSE to float32 for ring attention numerical stability.

rotate_method (str, optional, defaults to "allgather") : Method to use for rotating key/value states across devices in ring attention. Currently, only "allgather" is supported.

diffusers.hooks.apply_context_parallel[[diffusers.hooks.apply_context_parallel]]

Source

Apply context parallel on a model.

Xet Storage Details

Size:
2.5 kB
·
Xet hash:
eaadfc7b5a117c474831fc487c6b869266e87206a29bd97e914b13c18738798b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.