Papers
arxiv:2604.16060

Chain-of-Thought Degrades Visual Spatial Reasoning Capabilities of Multimodal LLMs

Published on Apr 17
ยท Submitted by
Aditya Kanade
on Apr 22
Authors:
,
,
,

Abstract

Chain-of-Thought prompting in multimodal reasoning models degrades performance in visual spatial reasoning due to shortcut learning and hallucination of visual details from text alone.

AI-generated summary

Multimodal Reasoning Models (MRMs) leveraging Chain-of-Thought (CoT) based thinking have revolutionized mathematical and logical problem-solving. However, we show that this paradigm struggles with generalized spatial intelligence. We perform a comprehensive evaluation of seventeen models across thirteen spatial benchmarks and identify a critical gap: CoT prompting consistently degrades performance in visual spatial reasoning. Furthermore, through a novel No-Image++ ablation, we demonstrate that MRMs and CoT prompted MLMs suffer from severe shortcut learning, and hallucinate visual details from textual priors even when the image is absent. These findings challenge the efficacy of text-only CoT for spatial tasks and underscore the need for vision-centric reasoning paradigms.

Community

Paper submitter

This paper reveals a surprising finding: Chain-of-Thought reasoning actually hurts performance on visual spatial tasks, with a comprehensive evaluation of seventeen models across thirteen spatial benchmarks showing consistent degradation under CoT prompting. The "No-Image++" ablation exposes a deeper problem, models are hallucinating visual details from textual priors rather than truly reasoning over images, making a strong case that text-only CoT is insufficient and vision-centric reasoning paradigms are needed.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.16060
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.16060 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.16060 in a dataset README.md to link it from this page.

Spaces citing this paper 2

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.