"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
-
unsloth/GLM-4.5-Air-GGUF
Text Generation • 110B • Updated • 10.9k • 158 -
unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF
31B • Updated • 45.9k • 141 -
unsloth/DeepSeek-V3-0324-GGUF-UD
Text Generation • 671B • Updated • 1.35k • 21 -
unsloth/cogito-v2-preview-llama-109B-MoE-GGUF
108B • Updated • 803 • 10