Instructions to use codellama/CodeLlama-70b-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codellama/CodeLlama-70b-hf with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="codellama/CodeLlama-70b-hf")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-70b-hf") model = AutoModelForMultimodalLM.from_pretrained("codellama/CodeLlama-70b-hf") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use codellama/CodeLlama-70b-hf with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "codellama/CodeLlama-70b-hf" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-70b-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/codellama/CodeLlama-70b-hf
- SGLang
How to use codellama/CodeLlama-70b-hf with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "codellama/CodeLlama-70b-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-70b-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "codellama/CodeLlama-70b-hf" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "codellama/CodeLlama-70b-hf", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use codellama/CodeLlama-70b-hf with Docker Model Runner:
docker model run hf.co/codellama/CodeLlama-70b-hf
Provide prompt examples
please provide some prompt examples and formatting, stop tokens
请帮我写一段python 读取excel 程序 excel 在某个目录下 子目录下面也有文件 xlsx 请用pyopenxl
@Eric1104 use pandas, or langchain.document_loaders.unstructured or llamaIndex. Tons of options available
The code output of Llama based models screw up Python indentation so bad that the code neither works nor can be fixed by auto formatters. Only manual fix can make the code work again. Anyone else noticed this?
Take a look at this simple python code it generated yesterday, The lines after "def:", "except:" and last "if:" have only 1 space characer. Also, "if:" and "elif": have different margins, all of these make the code buggy and unfixable. There are cases with 1,2,3,4 spaces!
import sys,base64
def main():
try:
if len(sys.argv)>=3:
opcode = str(sys.argv[1]) #operation code
data = str(sys.argv[2]) #data
if opcode == "enc":
encoded_string = base64.b64encode(bytes(data,"utf8"))
result = f"Encoded String:\n{encoded_string}"
elif opcode == "dec":
decoded_string = base64.b64decode(str(data))
result = f"Decoded String:\n {decoded_string}"
else:
raise Exception("Invalid Operation Code")
except IndexError:
print('Please provide two arguments')
except ValueError:
print('Please enter valid input')
except Exception as e:
print(f'An error occurred: {e}')
if __name__== '__main__':
main()