Instructions to use internlm/Intern-S2-Preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use internlm/Intern-S2-Preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="internlm/Intern-S2-Preview", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained("internlm/Intern-S2-Preview", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use internlm/Intern-S2-Preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "internlm/Intern-S2-Preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/internlm/Intern-S2-Preview

SGLang

How to use internlm/Intern-S2-Preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "internlm/Intern-S2-Preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "internlm/Intern-S2-Preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "internlm/Intern-S2-Preview",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use internlm/Intern-S2-Preview with Docker Model Runner:
```
docker model run hf.co/internlm/Intern-S2-Preview
```

RangiLyu commited on about 12 hours ago

Commit

7554694

verified ·

1 Parent(s): de4f18d

update readme

Browse files

Files changed (5) hide show

.gitattributes +1 -0
README.md +109 -3
deployment_guide.md +116 -0
figs/efficiency.jpg +2 -2
figs/performance.png +3 -0

.gitattributes CHANGED Viewed

@@ -36,3 +36,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 figs/efficiency.jpg filter=lfs diff=lfs merge=lfs -text
 figs/title.png filter=lfs diff=lfs merge=lfs -text

 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 figs/efficiency.jpg filter=lfs diff=lfs merge=lfs -text
 figs/title.png filter=lfs diff=lfs merge=lfs -text
+figs/performance.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -24,7 +24,7 @@ pipeline_tag: image-text-to-text
 ## Introduction
-We introduce **Intern-S2-Preview**, an efficient 35B scientific multimodal foundation model. Beyond conventional parameter and data scaling, Intern-S2-Preview explores **task scaling**: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model capabilities.
 By extending professional scientific tasks into a full-chain training pipeline from pre-training to reinforcement learning, Intern-S2-Preview achieves performance comparable to the trillion-scale Intern-S1-Pro on multiple core professional scientific tasks, while using only 35B parameters. At the same time, it maintains strong general reasoning, multimodal understanding, coding, and agent capabilities.
@@ -45,12 +45,12 @@ By extending professional scientific tasks into a full-chain training pipeline f
 We evaluate the Intern-S2-Preview on various benchmarks, including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.
-![performance](./figs/performance.jpeg)
 > **Note**: <u>Underline</u> means the best performance among open-sourced models, **Bold** indicates the best performance among all models.
-We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalKit](https://github.com/open-compass/vlmevalkit) to evaluate all models.
 ## Quick Start
@@ -288,3 +288,109 @@ print(json.dumps(response.model_dump(), indent=2, ensure_ascii=False))
 ```
 > Note: We do not recommend disabling thinking mode for agentic tasks.

 ## Introduction
+We introduce **Intern-S2-Preview**, an efficient 35B scientific multimodal foundation model continued pre-trained from Qwen3.5. Beyond conventional parameter and data scaling, Intern-S2-Preview explores **task scaling**: increasing the difficulty, diversity, and coverage of scientific tasks to further unlock model capabilities.
 By extending professional scientific tasks into a full-chain training pipeline from pre-training to reinforcement learning, Intern-S2-Preview achieves performance comparable to the trillion-scale Intern-S1-Pro on multiple core professional scientific tasks, while using only 35B parameters. At the same time, it maintains strong general reasoning, multimodal understanding, coding, and agent capabilities.
 We evaluate the Intern-S2-Preview on various benchmarks, including general datasets and scientific datasets. We report the performance comparison with the recent VLMs and LLMs below.
+![performance](./figs/performance.png)
 > **Note**: <u>Underline</u> means the best performance among open-sourced models, **Bold** indicates the best performance among all models.
+We use the [OpenCompass](https://github.com/open-compass/OpenCompass/) and [VLMEvalKit](https://github.com/open-compass/vlmevalkit) to evaluate all models. For text reasoning benchmarks, Intern-S2-Preview is evaluated with a maximum inference length of 128K tokens, while for multimodal benchmarks, it is evaluated with a maximum inference length of 64K tokens.
 ## Quick Start
 ```
 > Note: We do not recommend disabling thinking mode for agentic tasks.
+## Agent Integration
+Intern-S2-Preview can be plugged into agent frameworks in two ways: connecting to a **self-hosted deployment**, or calling the **official InternLM API**. Below we cover both, with examples for agent frameworks (OpenClaw, Hermes, etc.) and for Claude Code.
+### 1. Self-hosted Deployment (LMDeploy as an example)
+First, serve the model with LMDeploy following the [Model Deployment Guide](./deployment_guide.md). The example below assumes the server is running at `http://0.0.0.0:23333`.
+#### Connecting Agent Frameworks
+Most agent frameworks (OpenClaw, Hermes, etc.) accept an OpenAI-compatible endpoint. Point them at the LMDeploy server base url `http://0.0.0.0:23333/v1`.
+You can check the connection with the following command:
+```bash
+curl http://0.0.0.0:23333/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer EMPTY" \
+  -d '{
+    "model": "internlm/Intern-S2-Preview",
+    "messages": [
+      {"role": "user", "content": "Hello"}
+    ],
+    "temperature": 0.8,
+    "top_p": 0.95
+  }'
+```
+Or you can configure your agent framework with the environment variables
+```bash
+export OPENAI_API_KEY=EMPTY
+export OPENAI_BASE_URL=http://0.0.0.0:23333/v1
+export OPENAI_MODEL=internlm/Intern-S2-Preview
+```
+Remember to launch LMDeploy with `--tool-call-parser interns2-preview` so tool calls are parsed correctly.
+#### Connecting Claude Code
+LMDeploy exposes an Anthropic-compatible `/v1/messages` endpoint that Claude Code can talk to directly. Add the following to `~/.claude/settings.json`:
+```json
+{
+  "env": {
+    "ANTHROPIC_BASE_URL": "http://127.0.0.1:23333",
+    "ANTHROPIC_AUTH_TOKEN": "dummy",
+    "ANTHROPIC_MODEL": "internlm/Intern-S2-Preview",
+    "ANTHROPIC_CUSTOM_MODEL_OPTION": "internlm/Intern-S2-Preview"
+  }
+}
+```
+For a full walkthrough (curl verification, model routing, troubleshooting), see [LMDeploy × Claude Code](https://lmdeploy.readthedocs.io/en/latest/intergration/claude_code.html).
+### 2. Official Intern API
+If you do not want to self-host, you can use the official Intern API. Register at [internlm.intern-ai.org.cn](https://internlm.intern-ai.org.cn/) and create an API token (`sk-xxxxxxxx`).
+#### Connecting Agent Frameworks
+The service is OpenAI-compatible, so any agent framework works. You can set the base url to `https://chat.intern-ai.org.cn/api/v1` and the model name to `intern-s2-preview` in the cli or config file.
+You can check the connection with the following command:
+```bash
+curl https://chat.intern-ai.org.cn/api/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer sk-xxxxxxxx" \
+  -d '{
+    "model": "intern-s2-preview",
+    "messages": [
+      {"role": "user", "content": "Hello"}
+    ],
+    "temperature": 0.8,
+    "top_p": 0.95
+  }'
+```
+Refer to the [Intern API documentation](https://internlm.intern-ai.org.cn/api/document?lang=en) for the current endpoint, available model names, rate limits, and advanced parameters.
+#### Connecting Claude Code
+Claude Code can route to the official Intern API by pointing `ANTHROPIC_BASE_URL` at the Intern Anthropic-compatible gateway:
+```json
+{
+  "env": {
+    "ANTHROPIC_BASE_URL": "http://chat.staging.intern-ai.org.cn",
+    "ANTHROPIC_AUTH_TOKEN": "your-api-token",
+    "ANTHROPIC_MODEL": "intern-s2-preview",
+    "ANTHROPIC_SMALL_FAST_MODEL": "intern-s2-preview"
+  }
+}
+```
+Then start claude code with the following command:
+```bash
+claude --model intern-s2-preview
+```
+For step-by-step setup, see [Intern API × Claude Code Integration](https://internlm.intern-ai.org.cn/api/document?lang=en).

deployment_guide.md ADDED Viewed

	@@ -0,0 +1,116 @@

+# Intern-S2-Preview Deployment Guide
+The Intern-S2-Preview release is a 35B-A3B model stored in bfloat16 weight format. This guide provides deployment examples for the following configurations:
+- MTP speculative decoding (Recommended)
+- Basic serving without MTP
+- Long-context inference with YaRN RoPE configuration
+> NOTE: The commands below are reference configurations. Inference frameworks are under active development, so use the latest framework documentation and your local validation results when tuning production deployments.
+## LMDeploy
+Use the latest LMDeploy (>=0.13.0) with Intern-S2-Preview support.
+- Serving With MTP (Recommended)
+```bash
+lmdeploy serve api_server \
+    internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --backend pytorch \
+    --tp 2 \
+    --reasoning-parser default \
+    --tool-call-parser interns2-preview \
+    --speculative-algorithm qwen3_5_mtp \
+    --speculative-num-draft-tokens 4 \
+    --max-batch-size 256
+```
+- Basic Serving Without MTP
+```bash
+lmdeploy serve api_server \
+    internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --backend pytorch \
+    --tp 2 \
+    --reasoning-parser default \
+    --tool-call-parser interns2-preview
+```
+- Long-Context Serving
+For long-context inference, configure both `--session-len` and YaRN RoPE parameters. The following example uses a 512k context length:
+```bash
+lmdeploy serve api_server \
+    internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --tp 2 \
+    --backend pytorch \
+    --reasoning-parser default \
+    --tool-call-parser interns2-preview \
+    --session-len 512000 \
+    --max-batch-size 64 \
+    --hf-overrides '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11, 11, 10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}'
+```
+## vLLM
+Use the latest vLLM Docker image or source build with Intern-S2-Preview support.
+- Serving With MTP (Recommended)
+```bash
+vllm serve internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --tensor-parallel-size 2 \
+    --reasoning-parser qwen3 \
+    --enable-auto-tool-choice \
+    --tool-call-parser qwen3_coder \
+    --speculative-config '{"method":"mtp","num_speculative_tokens":4}'
+```
+- Basic Serving Without MTP
+```bash
+vllm serve internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --tensor-parallel-size 2 \
+    --reasoning-parser qwen3 \
+    --enable-auto-tool-choice \
+    --tool-call-parser qwen3_coder
+```
+## SGLang
+Use the latest SGLang Docker image or source build with Intern-S2-Preview support.
+- Serving With MTP (Recommended)
+```bash
+SGLANG_ENABLE_SPEC_V2=1 \
+python3 -m sglang.launch_server \
+  --model-path internLM/Intern-S2-Preview \
+  --trust-remote-code \
+  --tp-size 2 \
+  --reasoning-parser qwen3 \
+  --tool-call-parser qwen3_coder \
+  --mamba-scheduler-strategy extra_buffer \
+  --speculative-algo 'NEXTN' \
+  --speculative-eagle-topk 1 \
+  --speculative-num-steps 3 \
+  --speculative-num-draft-tokens 4
+```
+- Basic Serving Without MTP
+```bash
+python3 -m sglang.launch_server \
+    --model-path internlm/Intern-S2-Preview \
+    --trust-remote-code \
+    --tp-size 2 \
+    --reasoning-parser qwen3 \
+    --tool-call-parser qwen3_coder
+```

figs/efficiency.jpg CHANGED Viewed

Git LFS Details

SHA256: 2d7b1336523b6fe067a513fab92964c30c7a28a682a0debed4402041092bd8de
Pointer size: 131 Bytes
Size of remote file: 182 kB

Git LFS Details

SHA256: 39b53166ece4ceda370e99c9d864f8150b98159747cd84c3d538588e3934c859
Pointer size: 131 Bytes
Size of remote file: 346 kB

figs/performance.png ADDED Viewed

Git LFS Details

SHA256: 85ec61e9af588fb1f03774c79517b6052e93e63727e92b24bb5d868d8e420d03
Pointer size: 132 Bytes
Size of remote file: 1.1 MB