Instructions to use openbmb/MiniCPM3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM3-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/MiniCPM3-4B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM3-4B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM3-4B", trust_remote_code=True, device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use openbmb/MiniCPM3-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM3-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM3-4B

SGLang

How to use openbmb/MiniCPM3-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM3-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM3-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM3-4B with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM3-4B
```

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tools'

#22

by sankexin - opened Sep 23, 2024

Discussion

sankexin

Sep 23, 2024

File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:576, in PreTrainedTokenizerFast._encode_plus(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
554 def _encode_plus(
555 self,
556 text: Union[TextInput, PreTokenizedInput],
(...)
573 **kwargs,
574 ) -> BatchEncoding:
575 batched_input = [(text, text_pair)] if text_pair else [text]
--> 576 batched_output = self._batch_encode_plus(
577 batched_input,
578 is_split_into_words=is_split_into_words,
579 add_special_tokens=add_special_tokens,
580 padding_strategy=padding_strategy,
581 truncation_strategy=truncation_strategy,
582 max_length=max_length,
583 stride=stride,
584 pad_to_multiple_of=pad_to_multiple_of,
585 return_tensors=return_tensors,
586 return_token_type_ids=return_token_type_ids,
587 return_attention_mask=return_attention_mask,
588 return_overflowing_tokens=return_overflowing_tokens,
589 return_special_tokens_mask=return_special_tokens_mask,
590 return_offsets_mapping=return_offsets_mapping,
591 return_length=return_length,
592 verbose=verbose,
593 **kwargs,
594 )
596 # Return tensor is None, then we can remove the leading batch axis
597 # Overflowing tokens are returned as a batch of output so we keep them in this case
598 if return_tensors is None and not return_overflowing_tokens:

TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tools'

jinjingsysu

Sep 23, 2024

The same problem happens

jinjingsysu

Sep 23, 2024

The new version transformers package transformers-4.44.2-py3-none-any.whl will help to deal with the problem. Hope it can help you.

sankexin

Sep 23, 2024

The new version transformers package transformers-4.44.2-py3-none-any.whl will help to deal with the problem. Hope it can help you.

I am still reporting this error here，my python is py3.10.

pip show transformers
Name: transformers
Version: 4.44.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/conda/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: lmdeploy, peft, vllm

sankexin

Sep 26, 2024

•

edited Sep 26, 2024

good！Reconfigure the environment to start running now ，by4.44.2。

linglingdan

Oct 8, 2024

Please pull the code again, it may be due to a code update

neoz changed discussion status to closed Oct 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment