support for llama.cpp

#2
by Simon716 - opened

hi, would u support for llama.cpp?

Working on it, see https://github.com/ggml-org/llama.cpp/pull/20567

So this PR was closed without merge.
What now?

Opened a issue, maybe engaging with it can drive the process forward. https://github.com/ggml-org/llama.cpp/issues/20683

The my fork works, at least for Vulkan and CPU. Will try and keep it up-to-date, and maybe release binaries once I wrapped my head around the CI system.

Would help to know which OS and GGML runtime you're using, and if you run llamma.cpp standalone vs. inside LM-Studio, LocalAI, etc.

Funnily enough, I just noticed the containser seem to get built automatically, so you might already be able to use this version: https://github.com/QaDeS/llama.cpp/pkgs/container/llama.cpp/versions

Creating a release now nevertheless, will post here one it's available. Uploading the f16 and Q4_K_M GGUF currently, so it will be available under https://huggingface.co/mkit/Yuan3.0-Flash-GGUF soon. Hit me up if you need additional quantizations.

Thats great, thanks. Q4-K-m should be fine.

Done :) Feel free to try it out and let me know if something is missing or not working properly.

Sign up or log in to comment