Instructions to use ByteDance-Seed/Seed-OSS-36B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance-Seed/Seed-OSS-36B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/Seed-OSS-36B-Instruct")
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/Seed-OSS-36B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/Seed-OSS-36B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/Seed-OSS-36B-Instruct

SGLang

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/Seed-OSS-36B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/Seed-OSS-36B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance-Seed/Seed-OSS-36B-Instruct with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/Seed-OSS-36B-Instruct
```

vllm error:operator _C::marlin_qqq_gemm does not exist

by HourseCircle - opened Aug 21, 2025

Discussion

HourseCircle

Aug 21, 2025

python3 -m vllm.entrypoints.openai.api_server
--host 0.0.0.0
--port 8000
--enable-auto-tool-choice
--tool-call-parser seed_oss
--trust-remote-code
--model ByteDance-Seed/Seed-OSS-36B-Instruct
--chat-template ./chat_template.jinja
--served-model-name seed_oss
INFO 08-21 02:46:36 [init.py:241] Automatically detected platform cuda.
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/entrypoints/openai/api_server.py", line 43, in
from vllm.engine.async_llm_engine import AsyncLLMEngine # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/engine/async_llm_engine.py", line 18, in
from vllm.engine.llm_engine import LLMEngine
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/engine/llm_engine.py", line 30, in
from vllm.executor.executor_base import ExecutorBase
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/executor/executor_base.py", line 18, in
from vllm.model_executor.layers.sampler import SamplerOutput
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/model_executor/layers/sampler.py", line 16, in
from vllm.model_executor.layers.utils import apply_penalties
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/model_executor/layers/utils.py", line 8, in
from vllm import _custom_ops as ops
File "/home/ubuntu/workspace/eva/tmp/vllm/vllm/_custom_ops.py", line 440, in
@register_fake("_C::marlin_qqq_gemm")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 1023, in register
use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/library.py", line 214, in _register_fake
handle = entry.fake_impl.register(func_to_register, source)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/nfs/anaconda3/envs/vllm/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 31, in register
if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator _C::marlin_qqq_gemm does not exist

tclf90

Aug 21, 2025

I made it work by removing the VLLM_TEST_USE_PRECOMPILED_NIGHTLY_WHEEL variable:

VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/FoolPlayer/vllm.git@seed-oss
pip install git+https://github.com/Fazziekey/transformers.git@seed-oss

qby10

Aug 21, 2025

Thanks. I encountered the same problem, and your solution worked.

yo37

Aug 25, 2025

The official vllm repo has approved our MR. Please use the newest vllm commit, as introduced here.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment