Instructions to use PrimeIntellect/INTELLECT-3-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PrimeIntellect/INTELLECT-3-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="PrimeIntellect/INTELLECT-3-Base", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("PrimeIntellect/INTELLECT-3-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("PrimeIntellect/INTELLECT-3-Base", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PrimeIntellect/INTELLECT-3-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PrimeIntellect/INTELLECT-3-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/PrimeIntellect/INTELLECT-3-Base

SGLang

How to use PrimeIntellect/INTELLECT-3-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PrimeIntellect/INTELLECT-3-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PrimeIntellect/INTELLECT-3-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PrimeIntellect/INTELLECT-3-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use PrimeIntellect/INTELLECT-3-Base with Docker Model Runner:
```
docker model run hf.co/PrimeIntellect/INTELLECT-3-Base
```

mikasenghaas commited on Sep 9, 2025

Commit

2a902a6

unverified ·

1 Parent(s): 2fd24d8

Add test scripts

Browse files

Files changed (2) hide show

test_template.py +311 -0
test_tokenization.py +26 -0

test_template.py ADDED Viewed

	@@ -0,0 +1,311 @@

+# /// script
+# requires-python = ">=3.12"
+# dependencies = ["transformers", "jinja2"]
+# ///
+from transformers import AutoTokenizer
+def print_section(title, messages, tokenizers, **tokenizer_kwargs):
+    """Helper function to print formatted sections"""
+    print(f"\n{'=' * 60}")
+    print(f"{title}")
+    print(f"{'=' * 60}")
+    print(f"\n{messages=}\n")
+    for tokenizer_name, tokenizer in tokenizers.items():
+        print(f"\n{tokenizer_name=}\n")
+        content = tokenizer.apply_chat_template(
+            messages, tokenize=False, **tokenizer_kwargs
+        )
+        print(content)
+# Initialize tokenizer
+local_tokenizer = AutoTokenizer.from_pretrained(".")
+qwen3_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Coder-30B-A3B-Instruct")
+tokenizers = {"Local": local_tokenizer, "Qwen3-Coder": qwen3_tokenizer}
+# Only user message
+print_section(
+    "User message only",
+    [{"role": "user", "content": "What is the capital of France?"}],
+    tokenizers,
+)
+# User message with generation prompt
+print_section(
+    "User message with generation prompt",
+    [{"role": "user", "content": "What is the capital of France?"}],
+    tokenizers,
+    add_generation_prompt=True,
+)
+# User message with custom system message
+print_section(
+    "Custom system message",
+    [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "What is the capital of France?"},
+    ],
+    tokenizers,
+)
+# Single-turn with assistant response (no think)
+print_section(
+    "Single-turn with assistant response (no think)",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {"role": "assistant", "content": "The capital of France is Paris."},
+    ],
+    tokenizers,
+)
+# Single-turn with think embedded in content
+print_section(
+    "Single-turn with think embedded in content",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {
+            "role": "assistant",
+            "content": "<think>The user is asking about geography. France is a country in Europe, and its capital city is Paris. This is a straightforward factual question.</think>\nThe capital of France is Paris.",
+        },
+    ],
+    tokenizers,
+)
+# Single-turn with reasoning_content field
+print_section(
+    "Single-turn with reasoning_content field",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {
+            "role": "assistant",
+            "content": "The capital of France is Paris.",
+            "reasoning_content": "The user is asking about geography. France is a country in Europe, and its capital city is Paris.",
+        },
+    ],
+    tokenizers,
+)
+print_section(
+    "Single-turn with think section and reasoning_content field",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {
+            "role": "assistant",
+            "content": "<think>The user is asking about geography. France is a country in Europe, and its capital city is Paris. This is a straightforward factual question.</think>\nThe capital of France is Paris.",
+            "reasoning_content": "The user is asking about geography. France is a country in Europe, and its capital city is Paris. This is a straightforward factual question.",
+        },
+    ],
+    tokenizers,
+)
+# Multi-turn and assistant response with think sections (embedded in content)
+print_section(
+    "Multi-turn with think embedded in content",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {
+            "role": "assistant",
+            "content": "<think>This is a basic geography question.</think>\nThe capital of France is Paris.",
+        },
+        {"role": "user", "content": "What about Germany?"},
+        {
+            "role": "assistant",
+            "content": "<think>Another geography question. Germany's capital is Berlin.</think>\nThe capital of Germany is Berlin.",
+        },
+    ],
+    tokenizers,
+)
+# Multi-turn and assistant response with think sections (embedded in content)
+print_section(
+    "Multi-turn with reasoning_content field",
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {
+            "role": "assistant",
+            "reasoning_content": "The user is asking about geography. France is a country in Europe, and its capital city is Paris.",
+            "content": "The capital of France is Paris.",
+        },
+        {"role": "user", "content": "What about Germany?"},
+        {
+            "role": "assistant",
+            "reasoning_content": "Another geography question. Germany's capital is Berlin.",
+            "content": "The capital of Germany is Berlin.",
+        },
+    ],
+    tokenizers,
+)
+# Assistant with only think section, no visible content
+print_section(
+    "Assistant with only think section",
+    [
+        {
+            "role": "user",
+            "content": "Think about this problem but don't respond yet.",
+        },
+        {
+            "role": "assistant",
+            "content": "<think>The user wants me to think about something but not provide a response yet. I should just show my thinking process without any visible output.</think>",
+        },
+    ],
+    tokenizers,
+)
+# Assistant with unfinished think section
+print_section(
+    "Assistant with unfinished think section",
+    [
+        {
+            "role": "user",
+            "content": "Think about this problem but don't respond yet.",
+        },
+        {
+            "role": "assistant",
+            "content": "<think>The user wants me to think about something but not provide a response yet. I should just",
+        },
+    ],
+    tokenizers,
+)
+print_section(
+    "Empty think content",
+    [
+        {"role": "user", "content": "Say hello"},
+        {"role": "assistant", "content": "<think></think>Hello! How can I help you today?"},
+    ],
+    tokenizers,
+)
+print_section(
+    "Empty reasoning content",
+    [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Say hello"},
+        {
+            "role": "assistant",
+            "content": "Hello! How can I help you today?",
+            "reasoning_content": "",
+        },
+    ],
+    tokenizers,
+)
+# ============================================================================
+# EXAMPLE 7: Tool use scenario
+# ============================================================================
+tool_example = [
+    {"role": "user", "content": "What's the weather like in Paris?"},
+    {
+        "role": "assistant",
+        "content": "I'll check the weather in Paris for you.",
+        "reasoning_content": "I should use the get_weather tool for this.",
+        "tool_calls": [
+            {
+                "name": "get_weather",
+                "arguments": {"location": "Paris, France", "units": "celsius"},
+            }
+        ],
+    },
+    {
+        "role": "tool",
+        "content": "Current weather in Paris: 18°C, partly cloudy with light winds.",
+    },
+    {
+        "role": "assistant",
+        "content": "<think>The weather API returned current conditions for Paris. I should provide this information to the user in a clear format.</think>\nThe current weather in Paris is 18°C with partly cloudy skies and light winds. It's a pleasant day!",
+    },
+]
+# Define tools for this example
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_weather",
+            "description": "Get current weather information for a location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city and country",
+                    },
+                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                },
+                "required": ["location"],
+            },
+        },
+    }
+]
+print_section(
+    "Single-turn tool use with weather",
+    tool_example,
+    tokenizers,
+    tools=tools,
+)
+# ============================================================================
+# EXAMPLE 8: Multiple tool calls in one response
+# ============================================================================
+multi_tool_example = [
+    {
+        "role": "user",
+        "content": "I need to calculate 15 * 23 and also get the current time.",
+    },
+    {
+        "role": "assistant",
+        "content": "<think>The user wants two things: a calculation and the current time. I'll use two tools to get this information.</think>\nI'll help you with both the calculation and getting the current time.",
+        "tool_calls": [
+            {"name": "calculate", "arguments": {"expression": "15 * 23"}},
+            {"name": "get_current_time", "arguments": {}},
+        ],
+    },
+    {"role": "tool", "content": "345"},
+    {"role": "tool", "content": "2024-01-15T14:30:22Z"},
+    {
+        "role": "assistant",
+        "content": "Perfect! Here are your results:\n- 15 × 23 = 345\n- Current time: 2:30 PM UTC on January 15, 2024",
+    },
+]
+multi_tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "calculate",
+            "description": "Perform mathematical calculations",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "expression": {
+                        "type": "string",
+                        "description": "Mathematical expression to evaluate",
+                    }
+                },
+                "required": ["expression"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "get_current_time",
+            "description": "Get the current date and time",
+            "parameters": {"type": "object", "properties": {}},
+        },
+    },
+]
+print_section(
+    "Single-turn with multiple tool calls",
+    multi_tool_example,
+    tokenizers,
+    tools=multi_tools,
+)

test_tokenization.py ADDED Viewed

	@@ -0,0 +1,26 @@

+# /// script
+# requires-python = ">=3.12"
+# dependencies = ["transformers", "jinja2"]
+# ///
+from transformers import AutoTokenizer
+# Initialize tokenizer
+local_tokenizer = AutoTokenizer.from_pretrained(".")
+qwen3_tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Coder-30B-A3B-Instruct")
+# User message with custom system message
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "What is the capital of France?"},
+]
+print("Local")
+print(local_tokenizer.apply_chat_template(messages, tokenize=False))
+print(local_tokenizer.apply_chat_template(messages, tokenize=True))
+print("\n\nQwen3-Coder")
+print(qwen3_tokenizer.apply_chat_template(messages, tokenize=False))
+print(qwen3_tokenizer.apply_chat_template(messages, tokenize=True))