GGUFs working well on ik/llama.cpp!

by ubergarm - opened about 8 hours ago

about 8 hours ago

I released some GGUFs for both mainline and ik_llama.cpp and the model seems to be performing well in my limited opencode testing down below 4bpw even:

https://huggingface.co/ubergarm/MiniMax-M2.5-GGUF

Nice size model and congrats on the release! Happy Lunar New Years!

rosspanda0

about 5 hours ago

thank you!

ubergarm

about 5 hours ago

I put some speed benchmarks for full offload on 96GB VRAM and 128k context here: https://www.reddit.com/r/LocalLLaMA/comments/1r40o83/comment/o58rg7k/

This model is a bit heavy on context memory usage, so quantize kv-cache with -ctk q8_0 -ctv q8_0 or more advanced options on ik_llama.cpp.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment