Papers
arxiv:2602.03839

Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL

Published on May 19
Authors:
,

Abstract

Bandwidth-constrained distributed reinforcement learning post-training of large language models can achieve significant communication reduction by transmitting only visible weight updates after BF16 casting, using PULSESync for weight synchronization and PULSELoCo for gradient synchronization with error feedback.

AI-generated summary

Bandwidth-constrained distributed reinforcement learning (RL) post-training of large language models is bottlenecked by two channels: weight synchronization from trainers to inference workers, and gradient or pseudo-gradient synchronization across trainers. We find that approximately 99% of per-step weight updates are invisible after the BF16 cast used by standard training and inference forward passes. We explain this sparsity by showing that, at typical RL post-training learning rates, Adam updates often fall below the local BF16 rounding threshold. We turn this observation into an algorithmic principle called compute-visible sparsification: transmit only updates that would change the next forward pass. PULSE (Precision-gated Updates for Low-precision Sparse Exchange) turns this principle into two communication algorithms: PULSESync sends lossless sparse BF16 weight patches from trainers to inference workers, and PULSELoCo sparsifies DiLoCo-style FP32 pseudo-gradient synchronization with error feedback. Over bandwidth-constrained commodity networks, PULSESync cuts weight-synchronization communication by over 100x while reconstructing trainer weights bit-identically. PULSELoCo matches DiLoCo across four models while reducing trainer-to-trainer communication by over 17x versus DiLoCo and over 100x versus DDP in the largest evaluated setting.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.03839
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03839 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03839 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.