Instructions to use maicomputer/alpaca-native with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use maicomputer/alpaca-native with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="maicomputer/alpaca-native")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("maicomputer/alpaca-native")
model = AutoModelForCausalLM.from_pretrained("maicomputer/alpaca-native")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use maicomputer/alpaca-native with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "maicomputer/alpaca-native"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/maicomputer/alpaca-native

SGLang

How to use maicomputer/alpaca-native with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "maicomputer/alpaca-native" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "maicomputer/alpaca-native" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "maicomputer/alpaca-native",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use maicomputer/alpaca-native with Docker Model Runner:
```
docker model run hf.co/maicomputer/alpaca-native
```

Will this work on an RTX 3080 10gb?

by TheAIGuyz - opened Mar 22, 2023

Discussion

TheAIGuyz

Mar 22, 2023

I am new to running AI on local machines

I have the 10gb model of the RTX 3080

I see the .bin files add up to around 30gb

Will I still be able to use this model?

HDiffusion

Mar 23, 2023

With cpu offloading in the textgen webui it should work fine. If you quantize it to 4bit you should be able to fit the whole thing on your gpu. The reason the model is so big is because it's saved in 32bit, it will only be run in 16bit at most for inference.

kz919

Apr 10, 2023

•

edited Apr 10, 2023

I tested load_in_8bit=True, but it seems it spells out only nonsenses. It would be great if we could figure out how to do int8 quantization on this, it will make things even faster.
But it will fit on 10GB 3080, once you use that flag. The current memory consumption is around 14GB with fp16/bf16, and with int8 it will be cut in half.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment