Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL
Abstract
Reinforcement fine-tuning enhances language model reasoning while enabling calibrated abstention and clarification for unanswerable queries through a novel reward mechanism.
Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.
Community
RL fine-tuning makes LLMs better reasoners — but also bolder hallucinators on questions they can't actually answer. In this work, we argue a reliable model should abstain and pinpoint the missing information. We propose a clarification-aware RLVR reward that verifies whether post-refusal clarifications actually identify the key missing piece, and use it to train Abstain-R1 (3B). The model improves abstention + clarification quality while preserving performance on answerable queries. Model and benchmark released 🤗
Get this paper in your agent:
hf papers read 2604.17073 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper