AI & ML interests

A one-year long research workshop on large language models: the Summer of Language Models 21 🌸

Recent Activity

stas 
posted an update 3 days ago
view post
Post
99
PSA for DeepSpeed users - a long outstanding precision-related critical bug has been identified and fixed in https://github.com/deepspeedai/DeepSpeed/pull/8066 and a new release has been made.

The issue was about mixed precision mode downcasting buffers that had to be in fp32 - massively impacting correctness due to large static buffers - e.g. RoPE in Qwen3 models when using long sequence lengths 32K+.

Hopefully this fix brings Deepspeed to a close parity with FSDP2 which has been an issue since a long time.

You can still have the old behavior but you'd now need to manually configure it - by default the model's buffers will now remain in the original precision.

Please install deepspeed==0.19.2 which will do the right thing.

Thanks to Tunji Ruwase and Claude Opus 4.8 via Cursor for identifying and fixing the problem.