Instructions to use xthor/Qwen3-Embedding-0.6B-GraphQL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("xthor/Qwen3-Embedding-0.6B-GraphQL")

sentences = [
    "That is a happy person",
    "That is a happy dog",
    "That is a very happy person",
    "Today is a sunny day"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

llama-cpp-python

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="xthor/Qwen3-Embedding-0.6B-GraphQL",
	filename="model-f16.gguf",
)

llm.create_chat_completion(
	messages = "{\n    \"source_sentence\": \"That is a happy person\",\n    \"sentences\": [\n        \"That is a happy dog\",\n        \"That is a very happy person\",\n        \"Today is a sunny day\"\n    ]\n}"
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Use Docker

docker model run hf.co/xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

LM Studio
Jan
Ollama
How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Ollama:
```
ollama run hf.co/xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
```

Unsloth Studio new

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xthor/Qwen3-Embedding-0.6B-GraphQL to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for xthor/Qwen3-Embedding-0.6B-GraphQL to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for xthor/Qwen3-Embedding-0.6B-GraphQL to start chatting

Pi new

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Docker Model Runner:
```
docker model run hf.co/xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M
```

Lemonade

How to use xthor/Qwen3-Embedding-0.6B-GraphQL with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull xthor/Qwen3-Embedding-0.6B-GraphQL:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-Embedding-0.6B-GraphQL-Q4_K_M

List all available models

lemonade list

Qwen3-Embedding-0.6B-GraphQL

A fine-tune of Qwen/Qwen3-Embedding-0.6B that maps natural-language questions to GraphQL field coordinates (Type.field). The training signal targets owner-type disambiguation across cross-type field-name collisions — telling Issue.author apart from PullRequest.author, or SlaPolicy.description from the 261 other .description fields in a large schema.

Ships as both SentenceTransformer weights and GGUF builds for llama.cpp / Ollama.

Inference

SentenceTransformers (recommended)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("xthor/Qwen3-Embedding-0.6B-GraphQL")

query = "What's the nightly rate for this room?"
# coordinates of Type.field pairs
coords = [
    "Room.priceCents",
    "RoomUpgradeOffer.priceCents",
    "Ticket.priceCents",
]

q = model.encode(query, prompt_name="query")
c = model.encode(coords, prompt_name="document")
scores = (q @ c.T).tolist()

for coord, score in sorted(zip(coords, scores), key=lambda x: -x[1]):
    print(f"{score:.3f}  {coord}")

Two prompts are wired into the model and must be used for best results:

prompt_name="query" — natural-language questions
prompt_name="document" — GraphQL coordinate descriptions in the corpus

Ollama

# pull one quantization (Q8_0 is a good default — near-lossless, ~650 MB)
hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .

cat > Modelfile <<'EOF'
FROM ./model-q8_0.gguf
EOF
ollama create qwen3-graphql-embedder -f Modelfile

# OpenAI-compatible embeddings endpoint
curl -s http://localhost:11434/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3-graphql-embedder","input":"What is the nightly rate for this room?"}' \
  | jq '.data[0].embedding'

llama.cpp

hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .

./llama-server -m model-q8_0.gguf --embedding --port 8080
# POST http://localhost:8080/embedding   { "content": "..." }

Available GGUF quantizations

file	size	use case
`model-f16.gguf`	~1.2 GB	reference quality, parity with safetensors
`model-q8_0.gguf`	~650 MB	near-lossless; recommended default
`model-q4_k_m.gguf`	~400 MB	small footprint; accepts a minor quality trade-off

Results

223 held-out test queries · 28,893-coordinate corpus · 30% real SDLs (GitHub GHES, Saleor, Shopify, AniList) never seen in training.

metric	baseline	tuned (3 epochs)	lift
exact_match@1	0.090	0.229	+0.139 (+155%)
recall@3	0.130	0.318	+0.188
recall@5	0.161	0.345	+0.184 (+114%)
recall@10	0.215	0.435	+0.220 (+102%)
mrr@10	0.121	0.285	+0.164
ndcg@10	0.143	0.320	+0.177

Where the lift comes from

Direct questions ("has my package shipped?", "what's my total?") are already handled well by the base model. The gains come from indirect questions where the user names a concept rather than a field — those require owner-type reasoning, and that's where the base model falls behind.

Example: rank 101 → 1

"I need to understand what commitments we have regarding support response times. Where can I find that info?"

Correct target: SlaPolicy.description. The schema has 262 .description fields (on Incident, Issue, Resolution, SatisfactionSurvey, …). The task is picking the right owner — not the right field name.

	base	tuned
rank in full corpus (18,396 coordinates)	101	1
rank among 262 `.description` siblings	12	1
cosine(query, target)	0.428	0.383
cosine(query, base top-1 distractor)	0.484	0.303

The base model ranks SatisfactionSurvey.description and Incident.description above the target. The fine-tune demotes them — every wrong owner drops to 0.15–0.22 while the target becomes the top hit.

Example: rank 5 → 1

"What's the nightly rate for this room?"

Correct target: Room.priceCents. Six other .priceCents fields exist (upgrade offers, extensions, tickets).

	base	tuned
rank in full corpus	5	1
rank among 7 `.priceCents` siblings	3	1
cosine(query, target)	0.51	0.61
cosine(query, base top-1 distractor)	0.55 (`RoomUpgradeOffer`)	0.43
margin to runner-up	–0.04 (target loses)	+0.12

Even on a natural, direct question the base model picks the wrong owner (it ranks RoomUpgradeOffer.priceCents first). The fine-tune reverses the ordering and opens a clear margin.

Known trade-off

same_owner_wrong_field_rate@1 rose from 0.063 → 0.103. The model picks the right owner type more often but occasionally lands on the wrong field within the correct type. The signal is tuned for owner disambiguation; within-owner field disambiguation isn't rewarded. The next iteration adds competition sets that share owner and differ by field.

Training

run	epochs	batch	lr	loss
`qwen3`	2	64	5e-5	cached_mnrl
`qwen3-e3`	3	64	5e-5	cached_mnrl

Both: --max-seq-length 256, 4 hard negatives per anchor, bf16, full fine-tune (no LoRA), single H100. Published checkpoint: qwen3-e3.

Dataset

split	rows
train	4,788
val	94
test	223
corpus	28,893

Built from 7,626 raw seed pairs via world-leakage, per-row strict-leakage, and family-level semantic-dedup filters. The strict-leakage filter is aggressive on real-SDL queries, which is why val/test shrink to ~20% of raw.

Citation

Base model: Qwen3-Embedding-0.6B
GitHub Training GitHub-Repo-train-data
License: Apache 2.0 (inherited from the base)

Downloads last month: 2,167

Safetensors

Model size

0.6B params

Tensor type

BF16

Model tree for xthor/Qwen3-Embedding-0.6B-GraphQL

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Embedding-0.6B

Quantized

(63)

this model

xthor
/

Qwen3-Embedding-0.6B-GraphQL

Qwen3-Embedding-0.6B-GraphQL

Inference

SentenceTransformers (recommended)

Ollama

llama.cpp

Available GGUF quantizations

Results

Where the lift comes from

Example: rank 101 → 1

Example: rank 5 → 1

Known trade-off

Training

Dataset

Citation

Model tree for xthor/Qwen3-Embedding-0.6B-GraphQL

Dataset used to train xthor/Qwen3-Embedding-0.6B-GraphQL