Instructions to use RLHFlow/pair-preference-model-LLaMA3-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RLHFlow/pair-preference-model-LLaMA3-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RLHFlow/pair-preference-model-LLaMA3-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("RLHFlow/pair-preference-model-LLaMA3-8B")
model = AutoModelForMultimodalLM.from_pretrained("RLHFlow/pair-preference-model-LLaMA3-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RLHFlow/pair-preference-model-LLaMA3-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RLHFlow/pair-preference-model-LLaMA3-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RLHFlow/pair-preference-model-LLaMA3-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/RLHFlow/pair-preference-model-LLaMA3-8B

SGLang

How to use RLHFlow/pair-preference-model-LLaMA3-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RLHFlow/pair-preference-model-LLaMA3-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RLHFlow/pair-preference-model-LLaMA3-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RLHFlow/pair-preference-model-LLaMA3-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RLHFlow/pair-preference-model-LLaMA3-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use RLHFlow/pair-preference-model-LLaMA3-8B with Docker Model Runner:
```
docker model run hf.co/RLHFlow/pair-preference-model-LLaMA3-8B
```

Could you please test the consistency of preference between `RLHFlow/pair-preference-model-LLaMA3-8B` and GPT-4 on alpacaeval dataset?

by rungao2001 - opened Jun 20, 2024

Discussion

rungao2001

Jun 20, 2024

It may cost too much for small teams when they train a model and test it on alpacaeval2. And I believe that this model, with the strong ability of giving pair preference, can be a good judger to judge the responses for different models, and may can take the place of GPT-4. It maybe very very interesting to get the model win rate against GPT-4 on alpacaeval with RLHFlow/pair-preference-model-LLaMA3-8B as a judger, and compare the result with the official win rate shown on AlpacaEval Leaderboard.

weqweasdas

RLHFlow org Jun 22, 2024

Hi, thanks for your interests in our models.

The alpaca eval does not have a dataset. I do have some results actually for the mt-bench and lmsys though.

Preference model

lmsys/chatbot_arena_conversations 15k 0.822
Arena-Hard 0.791
lmsys/mt_bench_human_judgments/human 0.805
lmsys/mt_bench_human_judgments/gpt4 0.938

We delete the pairs with tie in the test.

It turns out that all the BT model, preference model, and armo have been used for online iterative dpo, and lead to models with alpaca eval win rate possibly > 50%. So the model can be used to overfit alpaca eval lol.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment