Instructions to use Salesforce/LLaMA-3-8B-SFR-RM-R with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/LLaMA-3-8B-SFR-RM-R with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Salesforce/LLaMA-3-8B-SFR-RM-R")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Salesforce/LLaMA-3-8B-SFR-RM-R") model = AutoModelForSequenceClassification.from_pretrained("Salesforce/LLaMA-3-8B-SFR-RM-R") - Notebooks
- Google Colab
- Kaggle
| license: llama3 | |
| # LLaMA-3-8B-SFR-RM-R | |
| This is the RM model for Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R. It is a Vanilla BT based Reward model. | |
| ## Model Releases | |
| - [SFT model](https://huggingface.co/Salesforce/SFR-SFT-LLaMA-3-8B-R) | |
| - [Reward model](https://huggingface.co/Salesforce/SFR-RM-LLaMA-3-8B-R) | |
| - [RLHF model](https://huggingface.co/Salesforce/SFR-Iterative-DPO-LLaMA-3-8B-R) | |
| ## Citation | |
| Please cite our techical report if you find our model is useful for your research or product. | |
| ```bibtex | |
| @misc{dong2024rlhf, | |
| title={RLHF Workflow: From Reward Modeling to Online RLHF}, | |
| author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang}, | |
| year={2024}, | |
| eprint={2405.07863}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG} | |
| } | |
| ``` | |
| ## Ethics disclaimer for Salesforce AI models, data, code | |
| This release is for research purposes only in support of an academic | |
| paper. Our models, datasets, and code are not specifically designed or | |
| evaluated for all downstream purposes. We strongly recommend users | |
| evaluate and address potential concerns related to accuracy, safety, and | |
| fairness before deploying this model. We encourage users to consider the | |
| common limitations of AI, comply with applicable laws, and leverage best | |
| practices when selecting use cases, particularly for high-risk scenarios | |
| where errors or misuse could significantly impact people’s lives, rights, | |
| or safety. For further guidance on use cases, refer to our standard | |
| [AUP](https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ExternalFacing_Services_Policy.pdf) | |
| and [AI AUP](https://www.salesforce.com/content/dam/web/en_us/www/documents/legal/Agreements/policies/ai-acceptable-use-policy.pdf). |