Update docs/deploy_guidance.md

#2
by Mingke977 - opened
Files changed (1) hide show
  1. docs/deploy_guidance.md +15 -3
docs/deploy_guidance.md CHANGED
@@ -8,17 +8,29 @@
8
  ## vLLM Deployment
9
 
10
  Here is the example to serve this model on a H200 single node with TP8 via vLLM:
 
 
11
  ```bash
12
- vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
13
- --tool-call-parser qwen3_coder --enable-auto-tool-choice \
14
- --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
15
  ```
 
 
 
 
 
16
  **Key notes**
17
  - `--tool-call-parser qwen3_coder`: Required for enabling tool calling
18
 
19
  ## SGLang Deployment
20
 
21
  Similarly, here is the example to run with TP8 on H200 in a single node via SGLang:
 
 
 
 
 
 
 
22
  ```bash
23
  python3 -m sglang.launch_server --model-path ${MODEL_PATH} --tp-size 8 --trust-remote-code \
24
  --tool-call-parser qwen3_coder \
 
8
  ## vLLM Deployment
9
 
10
  Here is the example to serve this model on a H200 single node with TP8 via vLLM:
11
+
12
+ 1. pull the Docker image.
13
  ```bash
14
+ docker pull jdopensource/joyai-llm-vllm:v0.13.0-joyai_llm_flash
 
 
15
  ```
16
+ 2. launch JoyAI-LLM Flash model with dense MTP.
17
+ ```bash
18
+ vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
19
+ --tool-call-parser qwen3_coder --enable-auto-tool-choice \
20
+ --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
21
  **Key notes**
22
  - `--tool-call-parser qwen3_coder`: Required for enabling tool calling
23
 
24
  ## SGLang Deployment
25
 
26
  Similarly, here is the example to run with TP8 on H200 in a single node via SGLang:
27
+
28
+ 1. pull the Docker image.
29
+ ```bash
30
+ docker pull jdopensource/joyai-llm-sglang:v0.5.8-joyai_llm_flash
31
+ ```
32
+ 2. launch JoyAI-LLM Flash model with dense MTP.
33
+
34
  ```bash
35
  python3 -m sglang.launch_server --model-path ${MODEL_PATH} --tp-size 8 --trust-remote-code \
36
  --tool-call-parser qwen3_coder \