Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards
Paper • 2604.17957 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This model outputs a reward for each reasoning step evaluating it.
Babelscape/Qwen2.5-Math-7B-PRM800k-r is a Process Reward Model (PRM) based on Qwen2.5-Math-7B-Instruct.
It is trained with process-supervision data from PRM800K.
Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards
Raffaele Pisano and Roberto Navigli, ACL 2026
Project page & paper: https://babelscape.github.io/prm-meets-planning/
arXiv: https://arxiv.org/abs/2604.17957
import torch
from transformers import AutoTokenizer, AutoModel
repo_id = "Babelscape/Qwen2.5-Math-7B-PRM800k-r"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True).eval()
def build_prompt(problem, steps):
steps_text = "\n".join([f"Step {i+1}: {step}\nки" for i, step in enumerate(steps)])
return f"Problem: {problem}\nSteps:\n{steps_text}"
problem = "If x + 3 = 10, find x."
steps = [
"Subtract 3 from both sides: x = 10 - 3.",
"So x = 7."
]
prompt = build_prompt(problem, steps)
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
pred_scalar = outputs["pred_scalar"]
marker_id = tokenizer.encode("ки", add_special_tokens=False)[0]
marker_positions = (inputs["input_ids"][0] == marker_id).nonzero(as_tuple=True)[0]
step_scores = torch.sigmoid(pred_scalar[0, marker_positions]).cpu().tolist()
print("Step scores:", step_scores)
first_bad = next((i for i, score in enumerate(step_scores) if score < 0.5), -1)
print("First failing step index:", first_bad)
If you use this model or the PDDL2PRM dataset in your work, please cite:
@inproceedings{pisano2026prmplanning,
title={Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards},
author={Pisano, Raffaele and Navigli, Roberto},
booktitle={Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2026},
note={Accepted}
}