GGUFs working well on ik/llama.cpp!
#9
by
ubergarm
- opened
I released some GGUFs for both mainline and ik_llama.cpp and the model seems to be performing well in my limited opencode testing down below 4bpw even:
https://huggingface.co/ubergarm/MiniMax-M2.5-GGUF
Nice size model and congrats on the release! Happy Lunar New Years!
thank you!
I put some speed benchmarks for full offload on 96GB VRAM and 128k context here: https://www.reddit.com/r/LocalLLaMA/comments/1r40o83/comment/o58rg7k/
This model is a bit heavy on context memory usage, so quantize kv-cache with -ctk q8_0 -ctv q8_0 or more advanced options on ik_llama.cpp.
