Instructions to use Trina-QwQ/wt-copilot with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trina-QwQ/wt-copilot with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Trina-QwQ/wt-copilot",
	filename="wtc_q4.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Trina-QwQ/wt-copilot with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Trina-QwQ/wt-copilot
# Run inference directly in the terminal:
llama-cli -hf Trina-QwQ/wt-copilot

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Trina-QwQ/wt-copilot
# Run inference directly in the terminal:
llama-cli -hf Trina-QwQ/wt-copilot

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Trina-QwQ/wt-copilot
# Run inference directly in the terminal:
./llama-cli -hf Trina-QwQ/wt-copilot

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Trina-QwQ/wt-copilot
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Trina-QwQ/wt-copilot

Use Docker

docker model run hf.co/Trina-QwQ/wt-copilot

LM Studio
Jan

vLLM

How to use Trina-QwQ/wt-copilot with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Trina-QwQ/wt-copilot"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trina-QwQ/wt-copilot",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Trina-QwQ/wt-copilot

Ollama
How to use Trina-QwQ/wt-copilot with Ollama:
```
ollama run hf.co/Trina-QwQ/wt-copilot
```

Unsloth Studio new

How to use Trina-QwQ/wt-copilot with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trina-QwQ/wt-copilot to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Trina-QwQ/wt-copilot to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Trina-QwQ/wt-copilot to start chatting

Pi new

How to use Trina-QwQ/wt-copilot with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Trina-QwQ/wt-copilot

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Trina-QwQ/wt-copilot"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Trina-QwQ/wt-copilot with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Trina-QwQ/wt-copilot

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Trina-QwQ/wt-copilot

Run Hermes

hermes

Docker Model Runner
How to use Trina-QwQ/wt-copilot with Docker Model Runner:
```
docker model run hf.co/Trina-QwQ/wt-copilot
```

Lemonade

How to use Trina-QwQ/wt-copilot with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Trina-QwQ/wt-copilot

Run and chat with the model

lemonade run user.wt-copilot-{{QUANT_TAG}}

List all available models

lemonade list

WT-Copilot

[重要!] 本模型包含以知识嵌入而非训练模式注入的第三方文本材料，例如可穿戴科技与其他源自Pixiv收集的绅士H色情小说。本模型的协议The Unlicense是相对于模型本体SFT部分而言的；对于生成内容对基座模型（Qwen3）或知识嵌入部分的文本材料，著作权归原作者所有，协议应当遵守原作者协议。

[重要!] 本模型不是对话模型。虽然使用了Qwen3-4B的对话模板，它仍然被设计为仅支持无系统提示词的单轮无推理对话，且推荐的对话内容为“续写/改写/扩写：（文本样例或大纲）”。其他用例属于off-label用例，由此带来的各种问题将被直接忽略。

[重要!] 本模型可能生成NSFW或其他不符合社会主义核心价值观的内容。请注意正确引导模型并采取合适的输入输出审查措施，例如接入llama-guard。未对输出进行审查或审查措施失效可能带来严重的后果，包括但不仅限于数据丢失、名誉损失与人身伤害，使用者应当自行承担。

WT-Copilot是一个创造性写作（creative writing）行内补齐与翻译模型。
可以使用llama.cpp或者其他gguf友好的客户端进行推理。例如：https://github.com/ggml-org/llama.cpp/releases/

使用8GB显存（虽然大概率只占用1GB）的GT-580大约能取得5tops的输出，足以满足流畅写作的需要。原则上核显可以跑，但是纯CPU（例如intel-KF）最好不要尝试。

WT-Copilot 的训练流程以领域内网络小说为核心数据来源。我们首先从公开可访问的网络小说中抓取大量连续叙事文本，这些文本以章节级结构、强风格化描写和高密度情绪表达为主要特征。由于原始网络小说数据噪声较高，在进入训练流程前会经过系统性的预处理与清洗，包括去除网页残留信息、广告文本、无关符号与明显损坏的段落，同时统一段落与标点结构，以保证文本在行文层面具备连续性与可读性。清洗后的数据会被进一步切分为适合行内补齐（inline completion）的训练样本，而非对话或问答格式。在完成数据准备后，模型会在该领域数据上进行预训练，以强化对长文本延续、语气稳定性与文学风格一致性的建模能力。在实际训练过程中我们发现，本领域网络小说文本普遍缺乏明确的文本指令或任务描述，传统的指令微调方式不仅收益有限，反而可能破坏模型的行文能力。因此 WT-Copilot 并未采用以指令为中心的对齐范式，而是选择以风格为核心的训练路径。为此我们引入了 SSFT（Style Supervised Fine-Tuning）与强化学习相结合的方法。具体而言，通过少量人工标注的高质量样本训练出一个用于风格与文本质量评估的模型，该评估模型会对主模型在大量、成分复杂且包含噪声的训练样本上的输出进行打分与筛选。主模型在此过程中并不是学习如何遵循指令，而是学习哪些行文方式更符合目标风格，并在强化学习阶段进一步放大这些写作特征。这种训练方式的核心目标是最大化文本的自然度、表现力与整体完成度，而非通用性或可控性。这种方法带来的直接结果是，WT-Copilot 在续写、改写以及行内补齐等创作型任务上具备极为突出的表现，生成文本在风格一致性与文学表现力方面明显优于常规通用模型。但与此同时，由于模型在训练过程中几乎未针对推理、对话或复杂指令进行优化，其推理能力与多轮对话能力基本丧失，在人名替换、实体对齐等指令跟随场景中表现尤为薄弱，且在部分情况下可能出现文本循环等问题。这些现象并非缺陷，而是该训练方法在明确取舍下的自然结果，也意味着 WT-Copilot 被明确设计为一个非对话、非通用用途的创造性写作行内补齐模型。

Downloads last month: 32

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for Trina-QwQ/wt-copilot

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(220)

this model