GGUFs working well on ik/llama.cpp!

#9
by ubergarm - opened

I released some GGUFs for both mainline and ik_llama.cpp and the model seems to be performing well in my limited opencode testing down below 4bpw even:

https://huggingface.co/ubergarm/MiniMax-M2.5-GGUF

perplexity

Nice size model and congrats on the release! Happy Lunar New Years!

thank you!

I put some speed benchmarks for full offload on 96GB VRAM and 128k context here: https://www.reddit.com/r/LocalLLaMA/comments/1r40o83/comment/o58rg7k/

This model is a bit heavy on context memory usage, so quantize kv-cache with -ctk q8_0 -ctv q8_0 or more advanced options on ik_llama.cpp.

Sign up or log in to comment