Papers
arxiv:2510.07761

Test-Time Reasoners Are Strategic Multiple-Choice Test-Takers

Published on Apr 20
Authors:
,
,

Abstract

Reasoning traces in large language models reveal that partial-input success in multiple-choice question answering can indicate valid inference rather than shallow shortcuts, suggesting a potential method to distinguish problematic from less problematic reasoning patterns.

AI-generated summary

Large language models (LLMs) now give reasoning before answering, excelling in tasks like multiple-choice question answering (MCQA). Yet, a concern is that LLMs do not solve MCQs as intended, as work finds LLMs sans reasoning succeed in MCQA without using the question, i.e., choices-only. Such partial-input success is often linked to trivial shortcuts, but reasoning traces could reveal if choices-only strategies are truly shallow. To examine these strategies, we have reasoning LLMs solve MCQs in full and choices-only inputs; test-time reasoning often boosts accuracy in full and in choices-only, half the time. While possibly due to shallow shortcuts, choices-only success is barely affected by the length of reasoning traces, and after finding traces pass faithfulness tests, we show they use less problematic strategies like inferring missing questions. In all, we challenge claims that partial-input success is always a flaw, so we propose how reasoning traces could separate problematic data from less problematic reasoning.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.07761 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.07761 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.07761 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.