Instructions to use maicomputer/alpaca-native with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use maicomputer/alpaca-native with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="maicomputer/alpaca-native")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("maicomputer/alpaca-native") model = AutoModelForCausalLM.from_pretrained("maicomputer/alpaca-native") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use maicomputer/alpaca-native with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "maicomputer/alpaca-native" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maicomputer/alpaca-native", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/maicomputer/alpaca-native
- SGLang
How to use maicomputer/alpaca-native with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "maicomputer/alpaca-native" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maicomputer/alpaca-native", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "maicomputer/alpaca-native" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "maicomputer/alpaca-native", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use maicomputer/alpaca-native with Docker Model Runner:
docker model run hf.co/maicomputer/alpaca-native
Will this work on an RTX 3080 10gb?
I am new to running AI on local machines
I have the 10gb model of the RTX 3080
I see the .bin files add up to around 30gb
Will I still be able to use this model?
With cpu offloading in the textgen webui it should work fine. If you quantize it to 4bit you should be able to fit the whole thing on your gpu. The reason the model is so big is because it's saved in 32bit, it will only be run in 16bit at most for inference.
I tested load_in_8bit=True, but it seems it spells out only nonsenses. It would be great if we could figure out how to do int8 quantization on this, it will make things even faster.
But it will fit on 10GB 3080, once you use that flag. The current memory consumption is around 14GB with fp16/bf16, and with int8 it will be cut in half.