Instructions to use SpectraSuite/QuantLM_830M_4bit_Unpacked with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SpectraSuite/QuantLM_830M_4bit_Unpacked with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SpectraSuite/QuantLM_830M_4bit_Unpacked")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SpectraSuite/QuantLM_830M_4bit_Unpacked") model = AutoModelForCausalLM.from_pretrained("SpectraSuite/QuantLM_830M_4bit_Unpacked") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SpectraSuite/QuantLM_830M_4bit_Unpacked with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SpectraSuite/QuantLM_830M_4bit_Unpacked" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SpectraSuite/QuantLM_830M_4bit_Unpacked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SpectraSuite/QuantLM_830M_4bit_Unpacked
- SGLang
How to use SpectraSuite/QuantLM_830M_4bit_Unpacked with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SpectraSuite/QuantLM_830M_4bit_Unpacked" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SpectraSuite/QuantLM_830M_4bit_Unpacked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SpectraSuite/QuantLM_830M_4bit_Unpacked" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SpectraSuite/QuantLM_830M_4bit_Unpacked", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SpectraSuite/QuantLM_830M_4bit_Unpacked with Docker Model Runner:
docker model run hf.co/SpectraSuite/QuantLM_830M_4bit_Unpacked
metadata
license: apache-2.0
QuantLM 830M 4 bit
QuantLM, unpacked to FP16 format - compatible with FP16 GEMMs. After unpacking, QuantLM has the same architecture as LLaMa.
import transformers as tf, torch
model_name = "SpectraSuite/QuantLM_830M_4bit_Unpacked"
# Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs.
pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto")
# These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly.
pipeline("Once upon a time")
- License: Apache 2.0
- We will use our GitHub repo for communication (including HF repo related queries). Feel free to open an issue here https://github.com/NolanoOrg/SpectraSuite