Issues with usage of the repository
Hi, thank you for the publication, very interesting work here! I wanted to try out a local installation of this repository and ran into a couple of issues. I have installed this model on AWS SageMaker AI with the Amazon Linux 2 operating system and on JupyterLab4 for the notebook. I am currently on an EC2 instance with the ml.g5.4xlarge node type that has 1 GPU. I was able to clone the repository and install the necessary packages in requirements.txt on a python 3.10 virtual environment. As this operating system's compiler is a bit old, I had to run mamba install -c conda-forge xgboost==3.1.2 instead of pip install xgboost for installation. I will post the two errors I encountered below. Did I do something wrong in my own usage of this repository, or is there something on your guy's end that should be fixed?
First, after installing all the requirements, I tried to run python inference.py as requested and got the following error message:
(PeptiVerse) [ec2-user@ip-172-21-244-78 PeptiVerse]$ python inference.py
tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 95.0/95.0 [00:00<00:00, 996kB/s]
vocab.txt: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 93.0/93.0 [00:00<00:00, 1.07MB/s]
special_tokens_map.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 125/125 [00:00<00:00, 1.56MB/s]
config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 724/724 [00:00<00:00, 8.98MB/s]
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.61G/2.61G [00:02<00:00, 1.08GB/s]
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 921, in
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 661, in init
self.smiles_embedder = SMILESEmbedder(self.device, clm_name=clm_name,
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 428, in init
self.tokenizer = SMILES_SPE_Tokenizer(vocab_path, splits_path)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/tokenizer/my_tokenizers.py", line 74, in init
raise ValueError("Can't find a vocabulary file at path '{}'.".format(vocab_file))
ValueError: Can't find a vocabulary file at path 'Classifier_Weight/tokenizer/new_vocab.txt'.
(PeptiVerse) [ec2-user@ip-172-21-244-78 PeptiVerse]$ python inference.py
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 921, in
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 661, in init
self.smiles_embedder = SMILESEmbedder(self.device, clm_name=clm_name,
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 428, in init
self.tokenizer = SMILES_SPE_Tokenizer(vocab_path, splits_path)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/tokenizer/my_tokenizers.py", line 74, in init
raise ValueError("Can't find a vocabulary file at path '{}'.".format(vocab_file))
ValueError: Can't find a vocabulary file at path 'Classifier_Weight/tokenizer/new_vocab.txt'.
I then tried to use the following script as specified:
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
and got the following error message:
/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
RoFormerForMaskedLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From πv4.50π onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
KeyError Traceback (most recent call last)
Cell In[1], line 3
1 from inference import PeptiVersePredictor
----> 3 pred = PeptiVersePredictor(
4 manifest_path="best_models.txt", # best model list
5 classifier_weight_root=".", # repo root (where training_classifiers/ lives)
6 device="cuda", # or "cpu"
7 )
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:668, in PeptiVersePredictor.init(self, manifest_path, classifier_weight_root, esm_name, clm_name, smiles_vocab, smiles_splits, device)
665 self.models: Dict[Tuple[str, str], Any] = {}
666 self.meta: Dict[Tuple[str, str], Dict[str, Any]] = {}
--> 668 self._load_all_best_models()
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:734, in PeptiVersePredictor._load_all_best_models(self)
731 continue
733 model_dir = self._resolve_dir(prop_key, m, mode)
--> 734 kind, obj, art = load_artifact(model_dir, self.device)
736 if kind in {"xgb", "joblib"}:
737 self.models[(prop_key, mode)] = obj
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:146, in load_artifact(model_dir, device)
143 return "xgb", booster, art
145 if art.suffix == ".joblib":
--> 146 obj = joblib.load(art)
147 return "joblib", obj, art
149 if art.suffix == ".pt":
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:749, in load(filename, mmap_mode, ensure_native_byte_order)
744 return load_compatibility(fobj)
746 # A memory-mapped array has to be mapped with the endianness
747 # it has been written with. Other arrays are coerced to the
748 # native endianness of the host system.
--> 749 obj = _unpickle(
750 fobj,
751 ensure_native_byte_order=ensure_native_byte_order,
752 filename=filename,
753 mmap_mode=validated_mmap_mode,
754 )
756 return obj
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:626, in _unpickle(fobj, ensure_native_byte_order, filename, mmap_mode)
624 obj = None
625 try:
--> 626 obj = unpickler.load()
627 if unpickler.compat_mode:
628 warnings.warn(
629 "The file '%s' has been generated with a "
630 "joblib version less than 0.10. "
(...)
633 stacklevel=3,
634 )
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/pickle.py:1213, in _Unpickler.load(self)
1211 raise EOFError
1212 assert isinstance(key, bytes_types)
-> 1213 dispatchkey[0]
1214 except _Stop as stopinst:
1215 return stopinst.value
KeyError: 118
Hi Jason! Thanks for letting us know about this! You found some bugs that we'll fix now on our end. Expect an update soon. Thanks again!
Hello Jason,
The path issue should be fixed now, basically it should be "./" pointing to the default folder instead of our previous old folder name "./Classifier_Weight".
For the second bug though, I didnt encounter issues executing
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
in jupyter notebook. There are some warnings during initialization, but not errors. Can you try again after updating your joblib package? My version is '1.5.1'.
Best,
Yinuo
Thank you for the quick response!
My current requirements installation script. I installed xgboost manually with a specific version after removing the package from the pip install due to the aforementioned old compiler issue on my system:
mamba create -n PeptiVerse python=3.10 -y
source ~/.bashrc
conda activate PeptiVerse
pip install ipykernel
python -m ipykernel install --user --name=gReLU --display-name "PeptiVerse"
# Install dependencies
pip install -r requirements_noxgboost.txt
mamba install -c -y conda-forge xgboost==3.1.2
pip install joblib==1.5.1
# Run inference
python inference.py
After doing the following, you'll see the error I get with pytorch-lightning not being installed. I can install pytorch-lightning if you'd like, but pytorch-lightning installation doesn't seem to be a part of the requirements.txt file currently.
(PeptiVerse) [ec2-user@ip-172-21-246-168 PeptiVerse]$ pip install joblib==1.5.1
Collecting joblib==1.5.1
Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Downloading joblib-1.5.1-py3-none-any.whl (307 kB)
Installing collected packages: joblib
Attempting uninstall: joblib
Found existing installation: joblib 1.5.3
Uninstalling joblib-1.5.3:
Successfully uninstalled joblib-1.5.3
Successfully installed joblib-1.5.1
(PeptiVerse) [ec2-user@ip-172-21-246-168 PeptiVerse]$ python inference.py
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 16, in
from lightning.pytorch import seed_everything
ModuleNotFoundError: No module named 'lightning'
Below is what happened after I ran the following:
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from inference import PeptiVersePredictor
3 pred = PeptiVersePredictor(
4 manifest_path="best_models.txt", # best model list
5 classifier_weight_root=".", # repo root (where training_classifiers/ lives)
6 device="cuda", # or "cpu"
7 )
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:16
14 from transformers import EsmModel, EsmTokenizer, AutoModelForMaskedLM
15 from tokenizer.my_tokenizers import SMILES_SPE_Tokenizer
---> 16 from lightning.pytorch import seed_everything
17 seed_everything(1986)
19 # -----------------------------
20 # Manifest
21 # -----------------------------
ModuleNotFoundError: No module named 'lightning'
Hello Jason,
I guess the error occurs because in the newest version I add a line to set global seed with lightning. I will add lightning to the basic pkg required list.
Best,
Yinuo
Hi Yinuo,
Thank you for updating the requirements.txt file, I was able to install pytorch-lightning. Now, I get the same issue as what I described before with loading the models in both the notebook implementation and script implementation of this repo:
(PeptiVerse) [ec2-user@ip-172-21-7-10 PeptiVerse]$ python inference.py
Seed set to 1986
tokenizer_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 95.0/95.0 [00:00<00:00, 1.01MB/s]
vocab.txt: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 93.0/93.0 [00:00<00:00, 1.43MB/s]
special_tokens_map.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 125/125 [00:00<00:00, 1.95MB/s]
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 724/724 [00:00<00:00, 11.2MB/s]
model.safetensors: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2.61G/2.61G [00:02<00:00, 1.00GB/s]
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 573/573 [00:00<00:00, 8.06MB/s]
pytorch_model.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 175M/175M [00:00<00:00, 196MB/s]
RoFormerForMaskedLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From πv4.50π onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using
trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes - If you are the owner of the model architecture code, please modify your model class such that it inherits from
GenerationMixin(afterPreTrainedModel, otherwise you'll get an exception). - If you are not the owner of the model architecture class, please contact the model code owner to update it.
generation_config.json: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 90.0/90.0 [00:00<00:00, 1.13MB/s]
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 967, in
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 708, in init
self._load_all_best_models()
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 780, in _load_all_best_models
kind, obj, art = load_artifact(model_dir, self.device)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 146, in load_artifact
obj = joblib.load(art)
File "/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 749, in load
obj = _unpickle(
File "/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py", line 626, in _unpickle
obj = unpickler.load()
File "/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/pickle.py", line 1213, in load
dispatchkey[0]
KeyError: 118
model.safetensors: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 175M/175M [00:00<00:00, 200MB/s]
from inference import PeptiVersePredictor
pred = PeptiVersePredictor(
manifest_path="best_models.txt", # best model list
classifier_weight_root=".", # repo root (where training_classifiers/ lives)
device="cuda", # or "cpu"
)
/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Seed set to 1986
RoFormerForMaskedLM has generative capabilities, as prepare_inputs_for_generation is explicitly overwritten. However, it doesn't directly inherit from GenerationMixin. From πv4.50π onwards, PreTrainedModel will NOT inherit from GenerationMixin, and this model will lose the ability to call generate and other related functions.
- If you're using trust_remote_code=True, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from GenerationMixin (after PreTrainedModel, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
KeyError Traceback (most recent call last)
Cell In[1], line 3
1 from inference import PeptiVersePredictor
----> 3 pred = PeptiVersePredictor(
4 manifest_path="best_models.txt", # best model list
5 classifier_weight_root=".", # repo root (where training_classifiers/ lives)
6 device="cuda", # or "cpu"
7 )
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:708, in PeptiVersePredictor.init(self, manifest_path, classifier_weight_root, esm_name, clm_name, smiles_vocab, smiles_splits, device)
705 self.models: Dict[Tuple[str, str], Any] = {}
706 self.meta: Dict[Tuple[str, str], Dict[str, Any]] = {}
--> 708 self._load_all_best_models()
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:780, in PeptiVersePredictor._load_all_best_models(self)
777 continue
779 model_dir = self._resolve_dir(prop_key, m, mode)
--> 780 kind, obj, art = load_artifact(model_dir, self.device)
782 if kind in {"xgb", "joblib"}:
783 self.models[(prop_key, mode)] = obj
File ~/SageMaker/fsx/Wu/PeptiVerse/inference.py:146, in load_artifact(model_dir, device)
143 return "xgb", booster, art
145 if art.suffix == ".joblib":
--> 146 obj = joblib.load(art)
147 return "joblib", obj, art
149 if art.suffix == ".pt":
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:749, in load(filename, mmap_mode, ensure_native_byte_order)
744 return load_compatibility(fobj)
746 # A memory-mapped array has to be mapped with the endianness
747 # it has been written with. Other arrays are coerced to the
748 # native endianness of the host system.
--> 749 obj = _unpickle(
750 fobj,
751 ensure_native_byte_order=ensure_native_byte_order,
752 filename=filename,
753 mmap_mode=validated_mmap_mode,
754 )
756 return obj
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/joblib/numpy_pickle.py:626, in _unpickle(fobj, ensure_native_byte_order, filename, mmap_mode)
624 obj = None
625 try:
--> 626 obj = unpickler.load()
627 if unpickler.compat_mode:
628 warnings.warn(
629 "The file '%s' has been generated with a "
630 "joblib version less than 0.10. "
(...)
633 stacklevel=3,
634 )
File ~/anaconda3/envs/PeptiVerse/lib/python3.10/pickle.py:1213, in _Unpickler.load(self)
1211 raise EOFError
1212 assert isinstance(key, bytes_types)
-> 1213 dispatchkey[0]
1214 except _Stop as stopinst:
1215 return stopinst.value
KeyError: 118
Hello,
Sorry for the late reply. I updated a lighter test case in readme. But your problem seems still be model version issue. If the environment is keep becoming the burden, I recommend to use singularity/apptainer packed env directly.
No worries, thank you for the support. I am not super familiar with the usage of apptainer, so let me figure it out and get back to you. In the meantime, I just wanted to update you on my installation efforts without this apptainer.
I updated my linux version from Amazon Linux 2 to the Amazon Linux 2023 OS. I was able to use the exact requirements.txt file you recommended last time and got a similar error as before:
mamba create -n PeptiVerse python=3.10 -y
source ~/.bashrc
conda activate PeptiVerse
pip install ipykernel
python -m ipykernel install --user --name=PeptiVerse --display-name "PeptiVerse"
Install dependencies
pip install -r requirements.txt
Run inference
python inference.py
echo 'done'
python inference.py
Seed set to 1986
RoFormerForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From πv4.50π onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 967, in <module>
predictor = PeptiVersePredictor(
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 708, in __init__
self._load_all_best_models()
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 780, in _load_all_best_models
kind, obj, art = load_artifact(model_dir, self.device)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 150, in load_artifact
ckpt = torch.load(art, map_location=device, weights_only=False)
File "/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/torch/serialization.py", line 1573, in load
return _legacy_load(
File "/home/ec2-user/anaconda3/envs/PeptiVerse/lib/python3.10/site-packages/torch/serialization.py", line 1822, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
I got this error after using the following commands when attempting to use the light-weight installation as you recommended. Again, I am using the Amazon Linux 2023 OS:
Ignore all LFS files, you will see an empty folder first
git clone --no-checkout https://huggingface.co/ChatterjeeLab/PeptiVerse
cd PeptiVerse
Enable sparse checkout
git sparse-checkout init --cone
Choose only selective items to download
git sparse-checkout set
inference.py
download_light.py
best_models.txt
basic_models.txt
requirements.txt
tokenizer
README.md
Now checkout
GIT_LFS_SKIP_SMUDGE=1 git checkout
Install basic pkgs
mamba create -n PeptiVerse_Sparse python=3.10 -y
source ~/.bashrc
conda activate PeptiVerse_Sparse
pip install ipykernel
#python -m ipykernel install --user --name=gReLU --display-name "PeptiVerse"
python -m ipykernel install --user --name=PeptiVerse_Sparse --display-name "PeptiVerse_Sparse"
pip install -r requirements.txt
Download basic model weights according to the basic_models.txt. Adjust which config you wanted as needed.
python download_light.py
Test in inference
python inference.py
Seed set to 1986
RoFormerForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From πv4.50π onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.
/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/sklearn/base.py:442: InconsistentVersionWarning: Trying to unpickle estimator SVR from version 1.7.1 when using version 1.7.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
warnings.warn(
Traceback (most recent call last):
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 971, in <module>
print(predictor.predict_property("hemolysis", "wt", "GIGAVLKVLTTGLPALISWIKRKRQQ"))
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 878, in predict_property
feats = self._get_features_for_model(prop_key, mode, input_str)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 829, in _get_features_for_model
v = self.wt_embedder.pooled(input_str) # (1,H)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/home/ec2-user/SageMaker/fsx/Wu/PeptiVerse/inference.py", line 626, in pooled
out = self.model(**tok)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/transformers/models/esm/modeling_esm.py", line 907, in forward
encoder_outputs = self.encoder(
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/transformers/models/esm/modeling_esm.py", line 612, in forward
layer_outputs = layer_module(
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/transformers/models/esm/modeling_esm.py", line 502, in forward
self_attention_outputs = self.attention(
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/transformers/models/esm/modeling_esm.py", line 436, in forward
self_outputs = self.self(
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/PeptiVerse_Sparse/lib/python3.10/site-packages/transformers/models/esm/modeling_esm.py", line 340, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`