Open to Work

Sean Li

Hellohal2064

AI & ML interests

AI Infrastructure Engineer | Dual DGX Sparks (230GB VRAM) | 5-node Docker Swarm | Building AI Coworker systems

Organizations

Posts 3

Post

359

I have update the vllm to the latest 0.16rc1 at https://hub.docker.com/repository/docker/hellohal2064/vllm-dgx-spark-gb10/general it will run all of the qwen3 models very well with thinking at 41 tok/s it is only setup to run on one spark. I think the documentation on DockerHub is up to date.

Post

374

🚀 vLLM Docker Image for NVIDIA DGX Spark (GB10/SM121)

Just released a pre-built vLLM Docker image optimized for DGX Spark's ARM64 + Blackwell SM121 GPU.

**Why this exists:**
Standard vLLM images don't support SM121 - you get "SM121 not supported" errors. This image includes patches for full GB10 compatibility.

**What's included:**
- vLLM 0.15.0 + SM121 patches
- PyTorch 2.11 + CUDA 13.0
- ARM64 (aarch64) native
- Pre-configured for FlashInfer attention

**Verified models:**
- Qwen3-Next-80B-A3B-FP8 (1M context!)
- Qwen3-Embedding-8B (4096-dim embeddings)
- Qwen3-VL-30B (vision)

docker pull
https://hub.docker.com/r/hellohal2064/vllm-dgx-spark-gb10