Instructions to use openbmb/MiniCPM3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM3-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/MiniCPM3-4B", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM3-4B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM3-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM3-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM3-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM3-4B
- SGLang
How to use openbmb/MiniCPM3-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM3-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM3-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM3-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM3-4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/MiniCPM3-4B with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM3-4B
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tools'
File /opt/conda/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:576, in PreTrainedTokenizerFast._encode_plus(self, text, text_pair, add_special_tokens, padding_strategy, truncation_strategy, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
554 def _encode_plus(
555 self,
556 text: Union[TextInput, PreTokenizedInput],
(...)
573 **kwargs,
574 ) -> BatchEncoding:
575 batched_input = [(text, text_pair)] if text_pair else [text]
--> 576 batched_output = self._batch_encode_plus(
577 batched_input,
578 is_split_into_words=is_split_into_words,
579 add_special_tokens=add_special_tokens,
580 padding_strategy=padding_strategy,
581 truncation_strategy=truncation_strategy,
582 max_length=max_length,
583 stride=stride,
584 pad_to_multiple_of=pad_to_multiple_of,
585 return_tensors=return_tensors,
586 return_token_type_ids=return_token_type_ids,
587 return_attention_mask=return_attention_mask,
588 return_overflowing_tokens=return_overflowing_tokens,
589 return_special_tokens_mask=return_special_tokens_mask,
590 return_offsets_mapping=return_offsets_mapping,
591 return_length=return_length,
592 verbose=verbose,
593 **kwargs,
594 )
596 # Return tensor is None, then we can remove the leading batch axis
597 # Overflowing tokens are returned as a batch of output so we keep them in this case
598 if return_tensors is None and not return_overflowing_tokens:
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'tools'
The same problem happens
The new version transformers package transformers-4.44.2-py3-none-any.whl will help to deal with the problem. Hope it can help you.
The new version transformers package transformers-4.44.2-py3-none-any.whl will help to deal with the problem. Hope it can help you.
I am still reporting this error here,my python is py3.10.
pip show transformers
Name: transformers
Version: 4.44.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/conda/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: lmdeploy, peft, vllm
good!Reconfigure the environment to start running now ,by4.44.2。
Please pull the code again, it may be due to a code update