Add vLLM Usage section (vLLM-Omni day-0 support)
#2
by shunyang90 - opened
README.md
CHANGED
|
@@ -21,3 +21,25 @@ Most large models today are **turn-based**: they answer only when you ask. But m
|
|
| 21 |
The decision of *when to act* is **learned inside the model** (from second-by-second time-aligned data + RL), not bolted on by an external turn-detector or polling loop. Vision is the first-class driver; speech (ASR/TTS) is treated as pluggable I/O.
|
| 22 |
|
| 23 |
To our knowledge, this is the **first open, vision-driven interaction model** released together with its training recipe, data, and a complete deployable system.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
The decision of *when to act* is **learned inside the model** (from second-by-second time-aligned data + RL), not bolted on by an external turn-detector or polling loop. Vision is the first-class driver; speech (ASR/TTS) is treated as pluggable I/O.
|
| 22 |
|
| 23 |
To our knowledge, this is the **first open, vision-driven interaction model** released together with its training recipe, data, and a complete deployable system.
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
## vLLM Usage
|
| 27 |
+
|
| 28 |
+
[vLLM-Omni](https://github.com/vllm-project/vllm-omni) provides **day-0 support** for JoyAI-VL-Interaction! The model is a standard Qwen3-VL VLM served by a plain `vllm serve`; vLLM-Omni adds the real-time interaction layer on top — the per-second **speak / silence / delegate** orchestration, 3-tier summary memory, and pluggable ASR / TTS / delegation. For installation and full setup, see the [vLLM-Omni recipe](https://github.com/vllm-project/vllm-omni/blob/main/recipes/JD/JoyAI-VL-Interaction.md).
|
| 29 |
+
|
| 30 |
+
### Online Serving
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
# git clone https://github.com/vllm-project/vllm-omni.git
|
| 34 |
+
|
| 35 |
+
# 1. Serve the model (plain `vllm serve`, NOT --omni — it is vanilla Qwen3-VL)
|
| 36 |
+
vllm serve jdopensource/JoyAI-VL-Interaction-Preview \
|
| 37 |
+
--served-model-name JoyAI-VL-Interaction-Preview --port 8061 \
|
| 38 |
+
--max-model-len 131072 --enable-prefix-caching --limit-mm-per-prompt '{"image":256,"video":1}'
|
| 39 |
+
|
| 40 |
+
# 2. Start the interaction orchestrator (OpenAI-compatible, :8070)
|
| 41 |
+
python -m vllm_omni.experimental.fullduplex.joyvl.serving.server --port 8070 \
|
| 42 |
+
--main-backend-url http://127.0.0.1:8061/v1 --main-model JoyAI-VL-Interaction-Preview
|
| 43 |
+
```
|
| 44 |
+
|
| 45 |
+
For the full browser demo — live webcam / RTSP input, voice (ASR/TTS), and the per-tick decision stream — run JD's official WebUI (`services/webui`) in front of the orchestrator; see the [vLLM-Omni recipe](https://github.com/vllm-project/vllm-omni/blob/main/recipes/JD/JoyAI-VL-Interaction.md) for the steps.
|