Papers
arxiv:2510.11020

GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation

Published on Oct 13, 2025
Authors:
,
,
,
,
,

Abstract

Large vision-language models for complex solid geometry solve problems by generating auxiliary-line constructions through reinforced textual descriptions aligned with spatial structures.

AI-generated summary

Auxiliary lines are essential for solving complex geometric problems but remain challenging for large vision-language models (LVLMs). Recent attempts construct auxiliary lines via code-driven rendering, a strategy that relies on accurate and executable code generation to produce visual renderings of the auxiliary lines for subsequent reasoning. However, in complex solid geometry settings, such a strong dependence on precise specifications substantially restricts the robustness of this strategy. Alternatively, we turn to a simpler and more stable solution, representing auxiliary-line constructions as structured textual descriptions. To bridge the gap between textual descriptions and spatial structure, we propose a reinforcement learning framework that enhances diagram-text alignment. The core is a cross-modal reward model that evaluates how well the generated auxiliary-line description matches the ground-truth auxiliary-line diagram. The reward signal drives a GRPO-based RL stage to yield informative auxiliary-line descriptions for the reasoning. To support the training and evaluation, we develop a scalable data pipeline and construct AuxSolidMath, a dataset of 3,018 real-exam geometry problems with paired diagrams and aligned textual fields. Based on this framework, we derive GeoVLMath, an LVLM for solving complex solid geometry.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.11020
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.11020 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.11020 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.11020 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.