support for llama.cpp
hi, would u support for llama.cpp?
Working on it, see https://github.com/ggml-org/llama.cpp/pull/20567
So this PR was closed without merge.
What now?
Opened a issue, maybe engaging with it can drive the process forward. https://github.com/ggml-org/llama.cpp/issues/20683
The my fork works, at least for Vulkan and CPU. Will try and keep it up-to-date, and maybe release binaries once I wrapped my head around the CI system.
Would help to know which OS and GGML runtime you're using, and if you run llamma.cpp standalone vs. inside LM-Studio, LocalAI, etc.
Funnily enough, I just noticed the containser seem to get built automatically, so you might already be able to use this version: https://github.com/QaDeS/llama.cpp/pkgs/container/llama.cpp/versions
Creating a release now nevertheless, will post here one it's available. Uploading the f16 and Q4_K_M GGUF currently, so it will be available under https://huggingface.co/mkit/Yuan3.0-Flash-GGUF soon. Hit me up if you need additional quantizations.
Thats great, thanks. Q4-K-m should be fine.
Done :) Feel free to try it out and let me know if something is missing or not working properly.