How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="trajis-tech/llama-cpp-python-trajis-tech-nonavx512-cuda",
	filename="{{GGUF_FILE}}",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

llama-cpp-python (Windows CUDA build)

Prebuilt wheel for:

  • llama_cpp_python 0.3.16
  • Windows x64
  • Python 3.12 (cp312)
  • CUDA enabled
  • AVX512 disabled
  • Supports NVIDIA 10 / 20 / 30 / 40 / 50 series GPUs
  • Trajis SmartSRT 1.0.0

Install

Direct install:

pip install "https://huggingface.co/trajis-tech/llama-cpp-python-trajis-tech-nonavx512-cuda/resolve/main/llama_cpp_python-0.3.16-cp312-cp312-win_amd64.whl"

Or download manually and install:

pip install llama_cpp_python-0.3.16-cp312-cp312-win_amd64.whl


Uninstall

pip uninstall llama-cpp-python


Requirements

  • Windows 64-bit
  • Python 3.12
  • NVIDIA GPU
  • CUDA Toolkit installed
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support