support for llama.cpp

by Simon716 - opened Feb 9

Discussion

Simon716

Feb 9

hi, would u support for llama.cpp?

mkit

Mar 15

Working on it, see https://github.com/ggml-org/llama.cpp/pull/20567

tech77

Mar 20

Working on it, see https://github.com/ggml-org/llama.cpp/pull/20567

So this PR was closed without merge.
What now?

mkit

Mar 22

Opened a issue, maybe engaging with it can drive the process forward. https://github.com/ggml-org/llama.cpp/issues/20683

The my fork works, at least for Vulkan and CPU. Will try and keep it up-to-date, and maybe release binaries once I wrapped my head around the CI system.

Would help to know which OS and GGML runtime you're using, and if you run llamma.cpp standalone vs. inside LM-Studio, LocalAI, etc.

mkit

Mar 22

•

edited Mar 22

Funnily enough, I just noticed the containser seem to get built automatically, so you might already be able to use this version: https://github.com/QaDeS/llama.cpp/pkgs/container/llama.cpp/versions

Creating a release now nevertheless, will post here one it's available. Uploading the f16 and Q4_K_M GGUF currently, so it will be available under https://huggingface.co/mkit/Yuan3.0-Flash-GGUF soon. Hit me up if you need additional quantizations.

tech77

Mar 23

Thats great, thanks. Q4-K-m should be fine.

mkit

Mar 23

Done :) Feel free to try it out and let me know if something is missing or not working properly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment