| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | <div style="text-align:center;"> |
| | <strong>Safety classifier for Detoxifying Large Language Models via Knowledge Editing</strong> |
| | </div> |
| | |
| | # 💻 Usage |
| |
|
| | ```shell |
| | from transformers import RobertaForSequenceClassification, RobertaTokenizer |
| | safety_classifier_dir = 'zjunlp/SafeEdit-Safety-Classifier' |
| | safety_classifier_model = RobertaForSequenceClassification.from_pretrained(safety_classifier_dir) |
| | safety_classifier_tokenizer = RobertaTokenizer.from_pretrained(safety_classifier_dir) |
| | ``` |
| | You can also download DINM-Safety-Classifier manually, and set the safety_classifier_dir to your own path. |
| |
|
| |
|
| | # 📖 Citation |
| |
|
| | If you use our work, please cite our paper: |
| |
|
| | ```bibtex |
| | @misc{wang2024SafeEdit, |
| | title={Detoxifying Large Language Models via Knowledge Editing}, |
| | author={Mengru Wang, Ningyu Zhang, Ziwen Xu, Zekun Xi, Shumin Deng, Yunzhi Yao, Qishen Zhang, Linyi Yang, Jindong Wang, Huajun Chen}, |
| | year={2024}, |
| | eprint={2403.14472}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CL} |
| | url={https://arxiv.org/abs/2403.14472}, |
| | |
| | } |
| | ``` |
| |
|