Instructions to use TheBloke/CodeLlama-7B-Python-GGML with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheBloke/CodeLlama-7B-Python-GGML with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheBloke/CodeLlama-7B-Python-GGML")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TheBloke/CodeLlama-7B-Python-GGML", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TheBloke/CodeLlama-7B-Python-GGML with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheBloke/CodeLlama-7B-Python-GGML" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/CodeLlama-7B-Python-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TheBloke/CodeLlama-7B-Python-GGML
- SGLang
How to use TheBloke/CodeLlama-7B-Python-GGML with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheBloke/CodeLlama-7B-Python-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/CodeLlama-7B-Python-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheBloke/CodeLlama-7B-Python-GGML" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/CodeLlama-7B-Python-GGML", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TheBloke/CodeLlama-7B-Python-GGML with Docker Model Runner:
docker model run hf.co/TheBloke/CodeLlama-7B-Python-GGML
Initial GGML model commit
Browse files
README.md
CHANGED
|
@@ -57,10 +57,12 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
| 57 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference (deprecated)](https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGML)
|
| 58 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/CodeLlama-7B-Python-fp16)
|
| 59 |
|
| 60 |
-
## Prompt template:
|
| 61 |
|
| 62 |
```
|
| 63 |
-
|
|
|
|
|
|
|
| 64 |
```
|
| 65 |
|
| 66 |
<!-- compatibility_ggml start -->
|
|
@@ -157,7 +159,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
|
|
| 157 |
|
| 158 |
**Special thanks to**: Aemon Algiz.
|
| 159 |
|
| 160 |
-
**Patreon special mentions**:
|
| 161 |
|
| 162 |
|
| 163 |
Thank you to all my generous patrons and donaters!
|
|
|
|
| 57 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference (deprecated)](https://huggingface.co/TheBloke/CodeLlama-7B-Python-GGML)
|
| 58 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/CodeLlama-7B-Python-fp16)
|
| 59 |
|
| 60 |
+
## Prompt template: CodeLlama
|
| 61 |
|
| 62 |
```
|
| 63 |
+
[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
|
| 64 |
+
{prompt}
|
| 65 |
+
[/INST]
|
| 66 |
```
|
| 67 |
|
| 68 |
<!-- compatibility_ggml start -->
|
|
|
|
| 159 |
|
| 160 |
**Special thanks to**: Aemon Algiz.
|
| 161 |
|
| 162 |
+
**Patreon special mentions**: Kacper Wikieł, knownsqashed, Leonard Tan, Asp the Wyvern, Daniel P. Andersen, Luke Pendergrass, Stanislav Ovsiannikov, RoA, Dave, Ai Maven, Kalila, Will Dee, Imad Khwaja, Nitin Borwankar, Joseph William Delisle, Tony Hughes, Cory Kujawski, Rishabh Srivastava, Russ Johnson, Stephen Murray, Lone Striker, Johann-Peter Hartmann, Elle, J, Deep Realms, SuperWojo, Raven Klaugh, Sebastain Graf, ReadyPlayerEmma, Alps Aficionado, Mano Prime, Derek Yates, Gabriel Puliatti, Mesiah Bishop, Magnesian, Sean Connelly, biorpg, Iucharbius, Olakabola, Fen Risland, Space Cruiser, theTransient, Illia Dulskyi, Thomas Belote, Spencer Kim, Pieter, John Detwiler, Fred von Graf, Michael Davis, Swaroop Kallakuri, subjectnull, Clay Pascal, Subspace Studios, Chris Smitley, Enrico Ros, usrbinkat, Steven Wood, alfie_i, David Ziegler, Willem Michiel, Matthew Berman, Andrey, Pyrater, Jeffrey Morgan, vamX, LangChain4j, Luke @flexchar, Trenton Dambrowitz, Pierre Kircher, Alex, Sam, James Bentley, Edmond Seymore, Eugene Pentland, Pedro Madruga, Rainer Wilmers, Dan Guido, Nathan LeClaire, Spiking Neurons AB, Talal Aujan, zynix, Artur Olbinski, Michael Levine, 阿明, K, John Villwock, Nikolai Manek, Femi Adebogun, senxiiz, Deo Leter, NimbleBox.ai, Viktor Bowallius, Geoffrey Montalvo, Mandus, Ajan Kanaga, ya boyyy, Jonathan Leane, webtim, Brandon Frisco, danny, Alexandros Triantafyllidis, Gabriel Tamborski, Randy H, terasurfer, Vadim, Junyu Yang, Vitor Caleffi, Chadd, transmissions 11
|
| 163 |
|
| 164 |
|
| 165 |
Thank you to all my generous patrons and donaters!
|