Spaces:

likhonsheikhdev
/

docker-model-runner

Running

App Files Files Community

docker-model-runner / README.md

likhonsheikhdev

Upload folder using huggingface_hub

7270816 verified 4 months ago

preview code

raw

history blame contribute delete

5.05 kB

	---
	title: Docker Model Runner
	emoji: 🐳
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	suggested_hardware: cpu-basic
	pinned: false
	---

	# Docker Model Runner

	Anthropic API Compatible with Interleaved Thinking support.

	## Hardware
	- CPU Basic: 2 vCPU · 16 GB RAM

	## Quick Start

	```bash
	pip install anthropic
	export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
	export ANTHROPIC_API_KEY=any-key
	```

	```python
	import anthropic

	client = anthropic.Anthropic()

	message = client.messages.create(
	model="MiniMax-M2",
	max_tokens=1000,
	system="You are a helpful assistant.",
	messages=[{"role": "user", "content": "Hi, how are you?"}]
	)

	for block in message.content:
	if block.type == "thinking":
	print(f"Thinking:\n{block.thinking}\n")
	elif block.type == "text":
	print(f"Text:\n{block.text}\n")
	```

	## Interleaved Thinking

	Enable thinking to get reasoning steps interleaved with responses:

	```python
	import anthropic

	client = anthropic.Anthropic(
	base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
	)

	message = client.messages.create(
	model="MiniMax-M2",
	max_tokens=1024,
	thinking={
	"type": "enabled",
	"budget_tokens": 200
	},
	messages=[{"role": "user", "content": "Explain quantum computing"}]
	)

	# Response contains interleaved thinking and text blocks
	for block in message.content:
	if block.type == "thinking":
	print(f"💭 Thinking: {block.thinking}")
	elif block.type == "text":
	print(f"📝 Response: {block.text}")
	```

	## Streaming with Thinking

	```python
	import anthropic

	client = anthropic.Anthropic(
	base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
	)

	with client.messages.stream(
	model="MiniMax-M2",
	max_tokens=1024,
	thinking={"type": "enabled", "budget_tokens": 100},
	messages=[{"role": "user", "content": "Hello!"}]
	) as stream:
	for event in stream:
	if hasattr(event, 'type'):
	if event.type == 'content_block_start':
	print(f"\n[{event.content_block.type}]", end=" ")
	elif event.type == 'content_block_delta':
	if hasattr(event.delta, 'thinking'):
	print(event.delta.thinking, end="")
	elif hasattr(event.delta, 'text'):
	print(event.delta.text, end="")
	```

	## Multi-Turn with Thinking History

	Important: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.

	```python
	import anthropic

	client = anthropic.Anthropic(
	base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
	)

	messages = [{"role": "user", "content": "What is 2+2?"}]

	# First turn
	response = client.messages.create(
	model="MiniMax-M2",
	max_tokens=1024,
	thinking={"type": "enabled", "budget_tokens": 100},
	messages=messages
	)

	# Append full response (including thinking) to history
	messages.append({
	"role": "assistant",
	"content": response.content # Includes both thinking and text blocks
	})

	# Second turn
	messages.append({"role": "user", "content": "Now multiply that by 3"})

	response2 = client.messages.create(
	model="MiniMax-M2",
	max_tokens=1024,
	thinking={"type": "enabled", "budget_tokens": 100},
	messages=messages
	)
	```

	## Supported Models

	\| Model \| Description \|
	\|-------\|-------------\|
	\| MiniMax-M2 \| Agentic capabilities, Advanced reasoning \|
	\| MiniMax-M2-Stable \| High concurrency and commercial use \|

	## API Compatibility

	### Parameters

	\| Parameter \| Status \|
	\|-----------\|--------\|
	\| model \| ✅ Fully supported \|
	\| messages \| ✅ Partial (text, tool calls) \|
	\| max_tokens \| ✅ Fully supported \|
	\| stream \| ✅ Fully supported \|
	\| system \| ✅ Fully supported \|
	\| temperature \| ✅ Range (0.0, 1.0] \|
	\| thinking \| ✅ Fully supported \|
	\| thinking.budget_tokens \| ✅ Fully supported \|
	\| tools \| ✅ Fully supported \|
	\| tool_choice \| ✅ Fully supported \|
	\| top_p \| ✅ Fully supported \|
	\| metadata \| ✅ Fully supported \|
	\| top_k \| ⚪ Ignored \|
	\| stop_sequences \| ⚪ Ignored \|

	### Message Types

	\| Type \| Status \|
	\|------\|--------\|
	\| text \| ✅ Supported \|
	\| thinking \| ✅ Supported \|
	\| tool_use \| ✅ Supported \|
	\| tool_result \| ✅ Supported \|
	\| image \| ❌ Not supported \|
	\| document \| ❌ Not supported \|

	## Endpoints

	\| Endpoint \| Method \| Description \|
	\|----------\|--------\|-------------\|
	\| `/v1/messages` \| POST \| Anthropic Messages API \|
	\| `/v1/chat/completions` \| POST \| OpenAI Chat API \|
	\| `/v1/models` \| GET \| List models \|
	\| `/health` \| GET \| Health check \|
	\| `/info` \| GET \| API info \|

	## cURL Example

	```bash
	curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
	-H "Content-Type: application/json" \
	-H "x-api-key: any-key" \
	-d '{
	"model": "MiniMax-M2",
	"max_tokens": 1024,
	"thinking": {"type": "enabled", "budget_tokens": 100},
	"messages": [
	{"role": "user", "content": "Explain AI briefly"}
	]
	}'
	```