llm.create_chat_completion(
messages = "{\n \"source_sentence\": \"That is a happy person\",\n \"sentences\": [\n \"That is a happy dog\",\n \"That is a very happy person\",\n \"Today is a sunny day\"\n ]\n}"
)Qwen3-Embedding-0.6B-GraphQL
A fine-tune of Qwen/Qwen3-Embedding-0.6B that maps natural-language questions to GraphQL field coordinates (Type.field). The training signal targets owner-type disambiguation across cross-type field-name collisions — telling Issue.author apart from PullRequest.author, or SlaPolicy.description from the 261 other .description fields in a large schema.
Ships as both SentenceTransformer weights and GGUF builds for llama.cpp / Ollama.
Inference
SentenceTransformers (recommended)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("xthor/Qwen3-Embedding-0.6B-GraphQL")
query = "What's the nightly rate for this room?"
# coordinates of Type.field pairs
coords = [
"Room.priceCents",
"RoomUpgradeOffer.priceCents",
"Ticket.priceCents",
]
q = model.encode(query, prompt_name="query")
c = model.encode(coords, prompt_name="document")
scores = (q @ c.T).tolist()
for coord, score in sorted(zip(coords, scores), key=lambda x: -x[1]):
print(f"{score:.3f} {coord}")
Two prompts are wired into the model and must be used for best results:
prompt_name="query"— natural-language questionsprompt_name="document"— GraphQL coordinate descriptions in the corpus
Ollama
# pull one quantization (Q8_0 is a good default — near-lossless, ~650 MB)
hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .
cat > Modelfile <<'EOF'
FROM ./model-q8_0.gguf
EOF
ollama create qwen3-graphql-embedder -f Modelfile
# OpenAI-compatible embeddings endpoint
curl -s http://localhost:11434/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3-graphql-embedder","input":"What is the nightly rate for this room?"}' \
| jq '.data[0].embedding'
llama.cpp
hf download xthor/Qwen3-Embedding-0.6B-GraphQL model-q8_0.gguf --local-dir .
./llama-server -m model-q8_0.gguf --embedding --port 8080
# POST http://localhost:8080/embedding { "content": "..." }
Available GGUF quantizations
| file | size | use case |
|---|---|---|
model-f16.gguf |
~1.2 GB | reference quality, parity with safetensors |
model-q8_0.gguf |
~650 MB | near-lossless; recommended default |
model-q4_k_m.gguf |
~400 MB | small footprint; accepts a minor quality trade-off |
Results
223 held-out test queries · 28,893-coordinate corpus · 30% real SDLs (GitHub GHES, Saleor, Shopify, AniList) never seen in training.
| metric | baseline | tuned (3 epochs) | lift |
|---|---|---|---|
| exact_match@1 | 0.090 | 0.229 | +0.139 (+155%) |
| recall@3 | 0.130 | 0.318 | +0.188 |
| recall@5 | 0.161 | 0.345 | +0.184 (+114%) |
| recall@10 | 0.215 | 0.435 | +0.220 (+102%) |
| mrr@10 | 0.121 | 0.285 | +0.164 |
| ndcg@10 | 0.143 | 0.320 | +0.177 |
Where the lift comes from
Direct questions ("has my package shipped?", "what's my total?") are already handled well by the base model. The gains come from indirect questions where the user names a concept rather than a field — those require owner-type reasoning, and that's where the base model falls behind.
Example: rank 101 → 1
"I need to understand what commitments we have regarding support response times. Where can I find that info?"
Correct target: SlaPolicy.description. The schema has 262 .description fields (on Incident, Issue, Resolution, SatisfactionSurvey, …). The task is picking the right owner — not the right field name.
| base | tuned | |
|---|---|---|
| rank in full corpus (18,396 coordinates) | 101 | 1 |
rank among 262 .description siblings |
12 | 1 |
| cosine(query, target) | 0.428 | 0.383 |
| cosine(query, base top-1 distractor) | 0.484 | 0.303 |
The base model ranks SatisfactionSurvey.description and Incident.description above the target. The fine-tune demotes them — every wrong owner drops to 0.15–0.22 while the target becomes the top hit.
Example: rank 5 → 1
"What's the nightly rate for this room?"
Correct target: Room.priceCents. Six other .priceCents fields exist (upgrade offers, extensions, tickets).
| base | tuned | |
|---|---|---|
| rank in full corpus | 5 | 1 |
rank among 7 .priceCents siblings |
3 | 1 |
| cosine(query, target) | 0.51 | 0.61 |
| cosine(query, base top-1 distractor) | 0.55 (RoomUpgradeOffer) |
0.43 |
| margin to runner-up | –0.04 (target loses) | +0.12 |
Even on a natural, direct question the base model picks the wrong owner (it ranks RoomUpgradeOffer.priceCents first). The fine-tune reverses the ordering and opens a clear margin.
Known trade-off
same_owner_wrong_field_rate@1 rose from 0.063 → 0.103. The model picks the right owner type more often but occasionally lands on the wrong field within the correct type. The signal is tuned for owner disambiguation; within-owner field disambiguation isn't rewarded. The next iteration adds competition sets that share owner and differ by field.
Training
| run | epochs | batch | lr | loss |
|---|---|---|---|---|
qwen3 |
2 | 64 | 5e-5 | cached_mnrl |
qwen3-e3 |
3 | 64 | 5e-5 | cached_mnrl |
Both: --max-seq-length 256, 4 hard negatives per anchor, bf16, full fine-tune (no LoRA), single H100. Published checkpoint: qwen3-e3.
Dataset
| split | rows |
|---|---|
| train | 4,788 |
| val | 94 |
| test | 223 |
| corpus | 28,893 |
Built from 7,626 raw seed pairs via world-leakage, per-row strict-leakage, and family-level semantic-dedup filters. The strict-leakage filter is aggressive on real-SDL queries, which is why val/test shrink to ~20% of raw.
Citation
- Base model: Qwen3-Embedding-0.6B
- GitHub Training GitHub-Repo-train-data
- License: Apache 2.0 (inherited from the base)
- Downloads last month
- 2,167







# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="xthor/Qwen3-Embedding-0.6B-GraphQL", filename="", )