Instructions to use RLHFlow/pair-preference-model-LLaMA3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use RLHFlow/pair-preference-model-LLaMA3-8B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="RLHFlow/pair-preference-model-LLaMA3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("RLHFlow/pair-preference-model-LLaMA3-8B") model = AutoModelForMultimodalLM.from_pretrained("RLHFlow/pair-preference-model-LLaMA3-8B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use RLHFlow/pair-preference-model-LLaMA3-8B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RLHFlow/pair-preference-model-LLaMA3-8B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLHFlow/pair-preference-model-LLaMA3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/RLHFlow/pair-preference-model-LLaMA3-8B
- SGLang
How to use RLHFlow/pair-preference-model-LLaMA3-8B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "RLHFlow/pair-preference-model-LLaMA3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLHFlow/pair-preference-model-LLaMA3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "RLHFlow/pair-preference-model-LLaMA3-8B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RLHFlow/pair-preference-model-LLaMA3-8B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use RLHFlow/pair-preference-model-LLaMA3-8B with Docker Model Runner:
docker model run hf.co/RLHFlow/pair-preference-model-LLaMA3-8B
Could you please test the consistency of preference between `RLHFlow/pair-preference-model-LLaMA3-8B` and GPT-4 on alpacaeval dataset?
It may cost too much for small teams when they train a model and test it on alpacaeval2. And I believe that this model, with the strong ability of giving pair preference, can be a good judger to judge the responses for different models, and may can take the place of GPT-4. It maybe very very interesting to get the model win rate against GPT-4 on alpacaeval with RLHFlow/pair-preference-model-LLaMA3-8B as a judger, and compare the result with the official win rate shown on AlpacaEval Leaderboard.
Hi, thanks for your interests in our models.
The alpaca eval does not have a dataset. I do have some results actually for the mt-bench and lmsys though.
Preference model
- lmsys/chatbot_arena_conversations 15k 0.822
- Arena-Hard 0.791
- lmsys/mt_bench_human_judgments/human 0.805
- lmsys/mt_bench_human_judgments/gpt4 0.938
We delete the pairs with tie in the test.
It turns out that all the BT model, preference model, and armo have been used for online iterative dpo, and lead to models with alpaca eval win rate possibly > 50%. So the model can be used to overfit alpaca eval lol.