Instructions to use TsinghuaAI/CPM-Generate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TsinghuaAI/CPM-Generate with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TsinghuaAI/CPM-Generate")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("TsinghuaAI/CPM-Generate") model = AutoModelForCausalLM.from_pretrained("TsinghuaAI/CPM-Generate") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TsinghuaAI/CPM-Generate with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TsinghuaAI/CPM-Generate" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TsinghuaAI/CPM-Generate", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TsinghuaAI/CPM-Generate
- SGLang
How to use TsinghuaAI/CPM-Generate with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TsinghuaAI/CPM-Generate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TsinghuaAI/CPM-Generate", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TsinghuaAI/CPM-Generate" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TsinghuaAI/CPM-Generate", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TsinghuaAI/CPM-Generate with Docker Model Runner:
docker model run hf.co/TsinghuaAI/CPM-Generate
`CpmTokenizer` is different from the original CPM-1 tokenizer in GitHub
#1
by ShaneTian - opened
transformers.CpmTokenizer is based on transformers.XLNetTokenizer, but the original CPM-1 tokenizer is not.
I found in fine-tuning:
- the original tokenizer always add an
eod_token = <eod>in the end of sentence , see here. - the
transformers.CpmTokenizeralways addsep_token = <sep>andcls_token = <cls>in the end of sentence, see here.
I am confused.
In LM fine-tuning, how to prepare the input data?
[token_id_1, token_id_2, ..., eod_token_id], whereeod_token_idis the id of<eod>token intransformers.CpmTokenizer[token_id_1, token_id_2, ..., eos_token_id], whereeos_token_idis the id of</s>token intransformers.CpmTokenizer[token_id_1, token_id_2, ..., eos_token_id], whereeos_token_idis the id of<|endoftext|>token intransformers.GPT2Tokenizer[token_id_1, token_id_2, ..., sep_token_id, cls_token_id], just callCpmTokenizer
Wow so sorry for the very much late reply! You are right, we should probably correct the build_inputs_with_special_tokens function, which is used when you set add_special_tokens = True (to format the inputs)
You can also change the template processor if you are using a fast tokenizer.
Thanks
ShaneTian changed discussion status to closed