Papers
arxiv:2603.05888

PixARMesh: Autoregressive Mesh-Native Single-View Scene Reconstruction

Published on Mar 6
· Submitted by
Xiang Zhang
on Mar 9
Authors:
,
,
,
,

Abstract

PixARMesh enables end-to-end 3D indoor scene mesh reconstruction from single RGB images using a unified model with cross-attention and autoregressive token generation.

AI-generated summary

We introduce PixARMesh, a method to autoregressively reconstruct complete 3D indoor scene meshes directly from a single RGB image. Unlike prior methods that rely on implicit signed distance fields and post-hoc layout optimization, PixARMesh jointly predicts object layout and geometry within a unified model, producing coherent and artist-ready meshes in a single forward pass. Building on recent advances in mesh generative models, we augment a point-cloud encoder with pixel-aligned image features and global scene context via cross-attention, enabling accurate spatial reasoning from a single image. Scenes are generated autoregressively from a unified token stream containing context, pose, and mesh, yielding compact meshes with high-fidelity geometry. Experiments on synthetic and real-world datasets show that PixARMesh achieves state-of-the-art reconstruction quality while producing lightweight, high-quality meshes ready for downstream applications.

Community

Paper author Paper submitter

PixARMesh is a mesh-native autoregressive framework for single-view 3D scene reconstruction.
Instead of reconstructing via intermediate volumetric or implicit representations, PixARMesh directly models instances with native mesh representation. Object poses and meshes are predicted in a unified autoregressive sequence.

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.05888 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.