Post
97
ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts
on 4×H100s: 12x longer sequences, 3.7x throughput
learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp
on 4×H100s: 12x longer sequences, 3.7x throughput
learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp