Doradus-AI's picture
4

Doradus-AI PRO

Doradus-AI

AI & ML interests

None yet

Recent Activity

posted an update 1 day ago
Tonight we validated a small upstream vLLM fix that brings GLM-5.1-REAP-478B back into our consumer-Blackwell rotation pool. Sleep/wake on 4× RTX PRO 6000 (SM_120) had a CuMemAllocator race that retired GLM in April: cuMemUnmap runs synchronously from the host the moment a pool-backed tensor's refcount hits zero, but kernels can still be in flight against that storage, accumulating CUDA_ERROR_ILLEGAL_ADDRESS, engine eventually unrecoverable. vllm-project/vllm#43020 is a one-line torch.cuda.synchronize() at the top of _python_free_callback. Steady-state inference unaffected (only cumem frees pay the cost). We caught the unpatched bug live during validation: ``` CUDA Error: invalid argument at /build/vllm/csrc/cumem_allocator.cpp:146 ``` That's the exact failure class #43020 fixes. With it bind-mounted in: Q3.6-27B sleep/wake cycle clean (25.8 GiB VRAM released on /sleep level=1, engine alive, post-wake chat coherent), GLM 30-request stress test 30/30 PASS, 0 CUDA errors. Back into rotation. Side win: we're also submitting a generic Triton autotune shmem-budget helper upstream that replaces hand-rolled check_shared_mem() ? [64,128] : [32,64] bucket switches with per-config precision via Triton's existing prune_configs_by={"early_config_prune": ...} hook. Zero change to the H100/H200 fast path. Submitted: vllm-project/vllm#43047 Full writeup with byte math + stress-test logs + the bind-mount overlay pattern: https://doradusresearch.ai/blog/sleep-mode-on-blackwell-part-2/ Hardware: 4× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM_120, 95 GiB per GPU, 101 KiB per-block opt-in shmem). Image stack documented in the writeup!
liked a model 3 months ago
Sehyo/Qwen3.5-122B-A10B-NVFP4
updated a model 5 months ago
Doradus-AI/RnJ-1-Instruct-FP8
View all activity

Organizations

Doradus Secure AI Solutions's profile picture