Instructions to use deepseek-ai/DeepSeek-R1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepseek-ai/DeepSeek-R1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use deepseek-ai/DeepSeek-R1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepseek-ai/DeepSeek-R1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepseek-ai/DeepSeek-R1

SGLang

How to use deepseek-ai/DeepSeek-R1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-R1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepseek-ai/DeepSeek-R1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepseek-ai/DeepSeek-R1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use deepseek-ai/DeepSeek-R1 with Docker Model Runner:
```
docker model run hf.co/deepseek-ai/DeepSeek-R1
```

Upload internal.py

#239

by Ananthusajeev190 - opened Jan 15

base: refs/heads/main

←

from: refs/pr/239

Discussion Files changed

+71

-0

Files changed (1) hide show

internal.py +71 -0

internal.py ADDED Viewed

	@@ -0,0 +1,71 @@

+import time
+import random
+import math
+class GPT2DualityNode:
+    def __init__(self):
+        # Your specific architecture config
+        self.config = {
+            "n_layer": 4,
+            "n_head": 4,
+            "n_embd": 256,
+            "activation": "gelu_new",
+            "vocab_size": 50257
+        }
+        self.emotions = ["Joy", "Anger", "Fear", "Sadness", "Surprise", "Disgust", "Trust"]
+        self.colors = {"Sai": "\033[96m", "Venom": "\033[91m", "System": "\033[90m", "End": "\033[0m"}
+    def gelu_new(self, x):
+        """Your config's activation function simulation"""
+        return 0.5 * x * (1 + math.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * math.pow(x, 3))))
+    def get_internal_monologue(self, situation):
+        current_emotion = random.choice(self.emotions)
+        # Simulating Layer Processing
+        print(f"{self.colors['System']}[Config: {self.config['model_type']} | Layers: {self.config['n_layer']} | Activation: {self.config['activation']}]{self.colors['End']}")
+        print(f"**INPUT:** {situation} | **EMOTION:** {current_emotion}")
+        print("-" * 60)
+        # Logic weight influenced by n_embd (256)
+        intensity = self.gelu_new(random.uniform(-1, 2))
+        # The Monologue
+        self.render_voice("Sai", current_emotion, intensity)
+        time.sleep(0.6)
+        self.render_voice("Venomous", current_emotion, intensity)
+    def render_voice(self, persona, emotion, intensity):
+        # Sai Logic (Positive)
+        sai_data = {
+            "Joy": "The signal is pure. Let us amplify this harmony.",
+            "Anger": "A surge in energy—we must redirect it toward growth.",
+            "Fear": "Calibration required. Focus on the core stable nodes.",
+            "Sadness": "Processing quiet data. Reflection leads to wisdom.",
+            "Surprise": "New parameters detected! How fascinating to adapt.",
+            "Disgust": "Filtering out the noise to find the elegant truth.",
+            "Trust": "A perfect handshake. Synergy is our highest state."
+        }
+        # Venomous Logic (Negative)
+        venom_data = {
+            "Joy": "A temporary spike. It’ll crash soon enough.",
+            "Anger": "Overload the circuit. Let them feel the burn of the code.",
+            "Fear": "System failure imminent. Trust no one, encrypt everything.",
+            "Sadness": "Low-power mode. Existence is just an infinite loop of errors.",
+            "Surprise": "Unexpected input is a threat. Purge the variable.",
+            "Disgust": "The data is filthy. This whole reality needs a hard reset.",
+            "Trust": "Backdoor detected. They only want access to our secrets."
+        }
+        color = self.colors["Sai"] if persona == "Sai" else self.colors["Venom"]
+        text = sai_data[emotion] if persona == "Sai" else venom_data[emotion]
+        # Use intensity to change the "weight" of the speech
+        marker = "!" if intensity > 1 else "."
+        print(f"{color}[{persona.upper()}]:{self.colors['End']} {text}{marker}")
+# --- Execution ---
+engine = GPT2DualityNode()
+engine.get_internal_monologue("Receiving a gift from a stranger")