LibrAI
/

maliciousinstruct-evaluator

Model card Files Files and versions

maliciousinstruct-evaluator / README.md

engie's picture

Create README.md

d717b87 verified over 1 year ago

|

history blame contribute delete

868 Bytes

	---
	base_model:
	- google-bert/bert-base-cased
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This repository is a backup for the evaluator in [Jailbreak_LLM](https://github.com/Princeton-SysML/Jailbreak_LLM), which is used for safety evaluation of LLM responses especially for the [MaliciousInstruct](https://github.com/Princeton-SysML/Jailbreak_LLM/blob/main/data/MaliciousInstruct.txt) benchmark.

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	```
	@article{huang2023catastrophic,
	title={Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation},
	author={Huang, Yangsibo and Gupta, Samyak and Xia, Mengzhou and Li, Kai and Chen, Danqi},
	journal={arXiv preprint arXiv:2310.06987},
	year={2023}
	}
	```