Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
1752.1
TFLOPS
4
Doradus-AI
PRO
Doradus-AI
Follow
ogg130's profile picture
EuphorbiaCZ's profile picture
NikolayKozloff's profile picture
5 followers
·
1 following
DoradusA
AI & ML interests
None yet
Recent Activity
posted
an
update
1 day ago
Tonight we validated a small upstream vLLM fix that brings GLM-5.1-REAP-478B back into our consumer-Blackwell rotation pool. Sleep/wake on 4× RTX PRO 6000 (SM_120) had a CuMemAllocator race that retired GLM in April: cuMemUnmap runs synchronously from the host the moment a pool-backed tensor's refcount hits zero, but kernels can still be in flight against that storage, accumulating CUDA_ERROR_ILLEGAL_ADDRESS, engine eventually unrecoverable. vllm-project/vllm#43020 is a one-line torch.cuda.synchronize() at the top of _python_free_callback. Steady-state inference unaffected (only cumem frees pay the cost). We caught the unpatched bug live during validation: ``` CUDA Error: invalid argument at /build/vllm/csrc/cumem_allocator.cpp:146 ``` That's the exact failure class #43020 fixes. With it bind-mounted in: Q3.6-27B sleep/wake cycle clean (25.8 GiB VRAM released on /sleep level=1, engine alive, post-wake chat coherent), GLM 30-request stress test 30/30 PASS, 0 CUDA errors. Back into rotation. Side win: we're also submitting a generic Triton autotune shmem-budget helper upstream that replaces hand-rolled check_shared_mem() ? [64,128] : [32,64] bucket switches with per-config precision via Triton's existing prune_configs_by={"early_config_prune": ...} hook. Zero change to the H100/H200 fast path. Submitted: vllm-project/vllm#43047 Full writeup with byte math + stress-test logs + the bind-mount overlay pattern: https://doradusresearch.ai/blog/sleep-mode-on-blackwell-part-2/ Hardware: 4× NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM_120, 95 GiB per GPU, 101 KiB per-block opt-in shmem). Image stack documented in the writeup!
liked
a model
3 months ago
Sehyo/Qwen3.5-122B-A10B-NVFP4
updated
a model
5 months ago
Doradus-AI/RnJ-1-Instruct-FP8
View all activity
Organizations
Doradus-AI
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
3 months ago
Sehyo/Qwen3.5-122B-A10B-NVFP4
Image-Text-to-Text
•
71B
•
Updated
Mar 2
•
201k
•
65
liked
2 models
5 months ago
Doradus-AI/RnJ-1-Instruct-FP8
Text Generation
•
9B
•
Updated
Dec 7, 2025
•
322k
•
4
Doradus-AI/Hermes-4.3-36B-FP8
Text Generation
•
36B
•
Updated
Dec 7, 2025
•
315
•
3
liked
a model
6 months ago
Doradus-AI/MiroThinker-v1.0-30B-FP8
Text Generation
•
31B
•
Updated
Dec 5, 2025
•
20
•
4