| --- |
| title: Docker Model Runner |
| emoji: π³ |
| colorFrom: blue |
| colorTo: purple |
| sdk: docker |
| app_port: 7860 |
| suggested_hardware: cpu-basic |
| pinned: false |
| --- |
| |
| # Docker Model Runner |
|
|
| **Anthropic API Compatible** with **Interleaved Thinking** support. |
|
|
| ## Hardware |
| - **CPU Basic**: 2 vCPU Β· 16 GB RAM |
|
|
| ## Quick Start |
|
|
| ```bash |
| pip install anthropic |
| export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space |
| export ANTHROPIC_API_KEY=any-key |
| ``` |
|
|
| ```python |
| import anthropic |
| |
| client = anthropic.Anthropic() |
| |
| message = client.messages.create( |
| model="MiniMax-M2", |
| max_tokens=1000, |
| system="You are a helpful assistant.", |
| messages=[{"role": "user", "content": "Hi, how are you?"}] |
| ) |
| |
| for block in message.content: |
| if block.type == "thinking": |
| print(f"Thinking:\n{block.thinking}\n") |
| elif block.type == "text": |
| print(f"Text:\n{block.text}\n") |
| ``` |
|
|
| ## Interleaved Thinking |
|
|
| Enable thinking to get reasoning steps interleaved with responses: |
|
|
| ```python |
| import anthropic |
| |
| client = anthropic.Anthropic( |
| base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
| ) |
| |
| message = client.messages.create( |
| model="MiniMax-M2", |
| max_tokens=1024, |
| thinking={ |
| "type": "enabled", |
| "budget_tokens": 200 |
| }, |
| messages=[{"role": "user", "content": "Explain quantum computing"}] |
| ) |
| |
| # Response contains interleaved thinking and text blocks |
| for block in message.content: |
| if block.type == "thinking": |
| print(f"π Thinking: {block.thinking}") |
| elif block.type == "text": |
| print(f"π Response: {block.text}") |
| ``` |
|
|
| ## Streaming with Thinking |
|
|
| ```python |
| import anthropic |
| |
| client = anthropic.Anthropic( |
| base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
| ) |
| |
| with client.messages.stream( |
| model="MiniMax-M2", |
| max_tokens=1024, |
| thinking={"type": "enabled", "budget_tokens": 100}, |
| messages=[{"role": "user", "content": "Hello!"}] |
| ) as stream: |
| for event in stream: |
| if hasattr(event, 'type'): |
| if event.type == 'content_block_start': |
| print(f"\n[{event.content_block.type}]", end=" ") |
| elif event.type == 'content_block_delta': |
| if hasattr(event.delta, 'thinking'): |
| print(event.delta.thinking, end="") |
| elif hasattr(event.delta, 'text'): |
| print(event.delta.text, end="") |
| ``` |
|
|
| ## Multi-Turn with Thinking History |
|
|
| **Important**: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity. |
|
|
| ```python |
| import anthropic |
| |
| client = anthropic.Anthropic( |
| base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
| ) |
| |
| messages = [{"role": "user", "content": "What is 2+2?"}] |
| |
| # First turn |
| response = client.messages.create( |
| model="MiniMax-M2", |
| max_tokens=1024, |
| thinking={"type": "enabled", "budget_tokens": 100}, |
| messages=messages |
| ) |
| |
| # Append full response (including thinking) to history |
| messages.append({ |
| "role": "assistant", |
| "content": response.content # Includes both thinking and text blocks |
| }) |
| |
| # Second turn |
| messages.append({"role": "user", "content": "Now multiply that by 3"}) |
| |
| response2 = client.messages.create( |
| model="MiniMax-M2", |
| max_tokens=1024, |
| thinking={"type": "enabled", "budget_tokens": 100}, |
| messages=messages |
| ) |
| ``` |
|
|
| ## Supported Models |
|
|
| | Model | Description | |
| |-------|-------------| |
| | MiniMax-M2 | Agentic capabilities, Advanced reasoning | |
| | MiniMax-M2-Stable | High concurrency and commercial use | |
|
|
| ## API Compatibility |
|
|
| ### Parameters |
|
|
| | Parameter | Status | |
| |-----------|--------| |
| | model | β
Fully supported | |
| | messages | β
Partial (text, tool calls) | |
| | max_tokens | β
Fully supported | |
| | stream | β
Fully supported | |
| | system | β
Fully supported | |
| | temperature | β
Range (0.0, 1.0] | |
| | thinking | β
Fully supported | |
| | thinking.budget_tokens | β
Fully supported | |
| | tools | β
Fully supported | |
| | tool_choice | β
Fully supported | |
| | top_p | β
Fully supported | |
| | metadata | β
Fully supported | |
| | top_k | βͺ Ignored | |
| | stop_sequences | βͺ Ignored | |
|
|
| ### Message Types |
|
|
| | Type | Status | |
| |------|--------| |
| | text | β
Supported | |
| | thinking | β
Supported | |
| | tool_use | β
Supported | |
| | tool_result | β
Supported | |
| | image | β Not supported | |
| | document | β Not supported | |
|
|
| ## Endpoints |
|
|
| | Endpoint | Method | Description | |
| |----------|--------|-------------| |
| | `/v1/messages` | POST | Anthropic Messages API | |
| | `/v1/chat/completions` | POST | OpenAI Chat API | |
| | `/v1/models` | GET | List models | |
| | `/health` | GET | Health check | |
| | `/info` | GET | API info | |
|
|
| ## cURL Example |
|
|
| ```bash |
| curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \ |
| -H "Content-Type: application/json" \ |
| -H "x-api-key: any-key" \ |
| -d '{ |
| "model": "MiniMax-M2", |
| "max_tokens": 1024, |
| "thinking": {"type": "enabled", "budget_tokens": 100}, |
| "messages": [ |
| {"role": "user", "content": "Explain AI briefly"} |
| ] |
| }' |
| ``` |
|
|