arxiv:2604.17073

Abstain-R1: Calibrated Abstention and Post-Refusal Clarification via Verifiable RL

Published on Apr 18

· Submitted by

zhaihaotian on Apr 23

Minnesota NLP

Upvote

Authors:

Abstract

Reinforcement fine-tuning enhances language model reasoning while enabling calibrated abstention and clarification for unanswerable queries through a novel reward mechanism.

AI-generated summary

Reinforcement fine-tuning improves the reasoning ability of large language models, but it can also encourage them to answer unanswerable queries by guessing or hallucinating missing information. Existing abstention methods either train models to produce generic refusals or encourage follow-up clarifications without verifying whether those clarifications identify the key missing information. We study queries that are clear in meaning but cannot be reliably resolved from the given information, and argue that a reliable model should not only abstain, but also explain what is missing. We propose a clarification-aware RLVR reward that, while rewarding correct answers on answerable queries, jointly optimizes explicit abstention and semantically aligned post-refusal clarification on unanswerable queries. Using this reward, we train Abstain-R1, a 3B model that improves abstention and clarification on unanswerable queries while preserving strong performance on answerable ones. Experiments on Abstain-Test, Abstain-QA, and SelfAware show that Abstain-R1 substantially improves over its base model and achieves unanswerable-query behavior competitive with larger systems including DeepSeek-R1, suggesting that calibrated abstention and clarification can be learned through verifiable rewards rather than emerging from scale alone.

View arXiv page View PDF Add to collection

Community

zhaihaotian

Paper submitter about 10 hours ago

RL fine-tuning makes LLMs better reasoners — but also bolder hallucinators on questions they can't actually answer. In this work, we argue a reliable model should abstain and pinpoint the missing information. We propose a clarification-aware RLVR reward that verifies whether post-refusal clarifications actually identify the key missing piece, and use it to train Abstain-R1 (3B). The model improves abstention + clarification quality while preserving performance on answerable queries. Model and benchmark released 🤗

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.17073

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.17073 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.17073 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.17073 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.