QMD Query Expansion 4B (GGUF)

GGUF conversion of the QMD Query Expansion model for use with Ollama, llama.cpp, and LM Studio.

Model Details

Base Model: Qwen/Qwen3-4B
SFT Adapter: tobil/qmd-query-expansion-4B-sft
GRPO Adapter: tobil/qmd-query-expansion-4B-grpo
Task: Query expansion for hybrid search (lex/vec/hyde format)

Available Quantizations

File	Quant	Description
qmd-query-expansion-4B-f16.gguf	F16	Full precision
qmd-query-expansion-4B-q8_0.gguf	Q8_0	8-bit
qmd-query-expansion-4B-q5_k_m.gguf	Q5_K_M	5-bit medium
qmd-query-expansion-4B-q4_k_m.gguf	Q4_K_M	4-bit medium (recommended)

Usage

With Ollama

# Download
huggingface-cli download tobil/qmd-query-expansion-4B-gguf qmd-query-expansion-4B-q4_k_m.gguf --local-dir .

# Create Modelfile
echo 'FROM ./qmd-query-expansion-4B-q4_k_m.gguf' > Modelfile

# Create and run
ollama create qmd-expand-4b -f Modelfile
ollama run qmd-expand-4b

Prompt Format

Use Qwen3 chat format with /no_think:

<|im_start|>user
/no_think Expand this search query: your query here<|im_end|>
<|im_start|>assistant

Expected Output

lex: keyword variation 1
lex: keyword variation 2
vec: natural language reformulation
hyde: Hypothetical document passage answering the query.

License

Apache 2.0 (inherited from Qwen3)

Downloads last month: 31

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tobil/qmd-query-expansion-4B-gguf

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(179)

this model