How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="xthor/Qwen3-Embedding-0.6B-GraphQL",
	filename="",
)
llm.create_chat_completion(
	messages = "{\n    \"source_sentence\": \"That is a happy person\",\n    \"sentences\": [\n        \"That is a happy dog\",\n        \"That is a very happy person\",\n        \"Today is a sunny day\"\n    ]\n}"
)

Qwen3-Embedding-0.6B-GraphQL

A fine-tune of Qwen/Qwen3-Embedding-0.6B that maps natural-language questions to GraphQL field coordinates (Type.field). The training signal targets owner-type disambiguation across cross-type field-name collisions — telling Issue.author apart from PullRequest.author, or SlaPolicy.description from the 261 other .description fields in a large schema.

Ships as both SentenceTransformer weights and GGUF builds for llama.cpp / Ollama.


Inference

SentenceTransformers (recommended)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("xthor/Qwen3-Embedding-0.6B-GraphQL")

query = "What's the nightly rate for this room?"
# coordinates of Type.field pairs
coords = [
    "Room.priceCents",
    "RoomUpgradeOffer.priceCents",
    "Ticket.priceCents",
]

q = model.encode(query, prompt_name="query")
c = model.encode(coords, prompt_name="document")
scores = (q @ c.T).tolist()

for coord, score in sorted(zip(coords, scores), key=lambda x: -x[1]):
    print(f"{score:.3f}  {coord}")

Two prompts are wired into the model and must be used for best results:

  • prompt_name="query" — natural-language questions
  • prompt_name="document" — GraphQL coordinate descriptions in the corpus

Ollama

# pull one quantization (Q8_0 is a good default — near-lossless, ~650 MB)
hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .

cat > Modelfile <<'EOF'
FROM ./model-q8_0.gguf
EOF
ollama create qwen3-graphql-embedder -f Modelfile

# OpenAI-compatible embeddings endpoint
curl -s http://localhost:11434/v1/embeddings \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3-graphql-embedder","input":"What is the nightly rate for this room?"}' \
  | jq '.data[0].embedding'

llama.cpp

hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .

./llama-server -m model-q8_0.gguf --embedding --port 8080
# POST http://localhost:8080/embedding   { "content": "..." }

Available GGUF quantizations

file size use case
model-f16.gguf ~1.2 GB reference quality, parity with safetensors
model-q8_0.gguf ~650 MB near-lossless; recommended default
model-q4_k_m.gguf ~400 MB small footprint; accepts a minor quality trade-off

Results

223 held-out test queries · 28,893-coordinate corpus · 30% real SDLs (GitHub GHES, Saleor, Shopify, AniList) never seen in training.

metric baseline tuned (3 epochs) lift
exact_match@1 0.090 0.229 +0.139 (+155%)
recall@3 0.130 0.318 +0.188
recall@5 0.161 0.345 +0.184 (+114%)
recall@10 0.215 0.435 +0.220 (+102%)
mrr@10 0.121 0.285 +0.164
ndcg@10 0.143 0.320 +0.177

baseline vs tuned — headline metrics

recall@k across the sweep

Where the lift comes from

Direct questions ("has my package shipped?", "what's my total?") are already handled well by the base model. The gains come from indirect questions where the user names a concept rather than a field — those require owner-type reasoning, and that's where the base model falls behind.

Example: rank 101 → 1

"I need to understand what commitments we have regarding support response times. Where can I find that info?"

Correct target: SlaPolicy.description. The schema has 262 .description fields (on Incident, Issue, Resolution, SatisfactionSurvey, …). The task is picking the right owner — not the right field name.

base tuned
rank in full corpus (18,396 coordinates) 101 1
rank among 262 .description siblings 12 1
cosine(query, target) 0.428 0.383
cosine(query, base top-1 distractor) 0.484 0.303

SlaPolicy sibling cosines

The base model ranks SatisfactionSurvey.description and Incident.description above the target. The fine-tune demotes them — every wrong owner drops to 0.15–0.22 while the target becomes the top hit.

SlaPolicy ranking ladder

Example: rank 5 → 1

"What's the nightly rate for this room?"

Correct target: Room.priceCents. Six other .priceCents fields exist (upgrade offers, extensions, tickets).

base tuned
rank in full corpus 5 1
rank among 7 .priceCents siblings 3 1
cosine(query, target) 0.51 0.61
cosine(query, base top-1 distractor) 0.55 (RoomUpgradeOffer) 0.43
margin to runner-up –0.04 (target loses) +0.12

Room sibling cosines

Even on a natural, direct question the base model picks the wrong owner (it ranks RoomUpgradeOffer.priceCents first). The fine-tune reverses the ordering and opens a clear margin.

Room ranking ladder

Known trade-off

same_owner_wrong_field_rate@1 rose from 0.063 → 0.103. The model picks the right owner type more often but occasionally lands on the wrong field within the correct type. The signal is tuned for owner disambiguation; within-owner field disambiguation isn't rewarded. The next iteration adds competition sets that share owner and differ by field.

metric deltas


Training

run epochs batch lr loss
qwen3 2 64 5e-5 cached_mnrl
qwen3-e3 3 64 5e-5 cached_mnrl

Both: --max-seq-length 256, 4 hard negatives per anchor, bf16, full fine-tune (no LoRA), single H100. Published checkpoint: qwen3-e3.

Dataset

split rows
train 4,788
val 94
test 223
corpus 28,893

Built from 7,626 raw seed pairs via world-leakage, per-row strict-leakage, and family-level semantic-dedup filters. The strict-leakage filter is aggressive on real-SDL queries, which is why val/test shrink to ~20% of raw.


Citation

Downloads last month
2,167
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xthor/Qwen3-Embedding-0.6B-GraphQL

Quantized
(63)
this model

Dataset used to train xthor/Qwen3-Embedding-0.6B-GraphQL