Instructions to use AI4PD/ZymCTRL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AI4PD/ZymCTRL with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AI4PD/ZymCTRL")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("AI4PD/ZymCTRL") model = AutoModelForCausalLM.from_pretrained("AI4PD/ZymCTRL") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use AI4PD/ZymCTRL with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AI4PD/ZymCTRL" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI4PD/ZymCTRL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AI4PD/ZymCTRL
- SGLang
How to use AI4PD/ZymCTRL with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AI4PD/ZymCTRL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI4PD/ZymCTRL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AI4PD/ZymCTRL" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AI4PD/ZymCTRL", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AI4PD/ZymCTRL with Docker Model Runner:
docker model run hf.co/AI4PD/ZymCTRL
Can we do the inferences on ZymCTRL with multi GPUs?
Thank you Prof. Noelia Ferruz for your excellent work!
I have tried on V100-32G GPU, and it took very long time: about 204 minutes for the default enzyme nitrilase (3.5.5.1) in your Example 1. Since my team has a 4xV100-32G Nvidia DGX machine, we wonder if it is possible to modify your script of Example 1 to fully use all 4 GPUs, in order to speed up the inference. We also have tried Example 1 on single RTX6000ada - 48G GPU, it also took as long as 44 minutes. It seems nn.parallel.DistributedDataParallel will do, but when I did as follows:
model = GPT2LMHeadModel.from_pretrained('/my/path/to/zymCTRL').to(device)
if torch.cuda.device_count() > 1:
print('f"Use {torch.cuda_device_count()} GPUs")
model = torch.nn.parallel.DistributedDataParallel(model)
an error message showed: "RuntimeError: Default process group has not been initialized, please make sure to call init_process_group"
guruace
Hi guruace,
How many sequences were you generating during that time? With your GPUs, I'd expect it generates more than 2000 sequences in that time (possibly many more).
Certainly the first batch does not take more than 2-5 minutes when I use an A40.
Are you sure the GPU is being used?
Alternatively, I've never tried, but I think HuggingFace supports inference on multiple GPUs: https://huggingface.co/docs/transformers/perf_infer_gpu_many
Hope this helps,
Noelia
Dear Noelia,
Yes, I was quite sure that it was using GPU, but used single GPU(it is quite sure also from your script). I also tested on my MacBook Pro M1 - 64G, it presumably ran on CPU only and it took 36 hours to produce only 572 sequences. On RTX6000ada and V100-32g, there were 1300 and 1290 sequences generated. Based on M1 results, I was very sure that running on V100-32g was using GPU, not running on CPU alone.
Thank you!
guruace