Papers
arxiv:2604.19747

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Published on Apr 21
· Submitted by
Zhixuan Liang
on Apr 22
Authors:
,
,
,
,
,
,

Abstract

AnyRecon enables scalable 3D reconstruction from arbitrary sparse inputs using diffusion models with persistent scene memory and geometry-aware conditioning for improved geometric consistency.

AI-generated summary

Sparse-view 3D reconstruction is essential for modeling scenes from casual captures, but remain challenging for non-generative reconstruction. Existing diffusion-based approaches mitigates this issues by synthesizing novel views, but they often condition on only one or two capture frames, which restricts geometric consistency and limits scalability to large or diverse scenes. We propose AnyRecon, a scalable framework for reconstruction from arbitrary and unordered sparse inputs that preserves explicit geometric control while supporting flexible conditioning cardinality. To support long-range conditioning, our method constructs a persistent global scene memory via a prepended capture view cache, and removes temporal compression to maintain frame-level correspondence under large viewpoint changes. Beyond better generative model, we also find that the interplay between generation and reconstruction is crucial for large-scale 3D scenes. Thus, we introduce a geometry-aware conditioning strategy that couples generation and reconstruction through an explicit 3D geometric memory and geometry-driven capture-view retrieval. To ensure efficiency, we combine 4-step diffusion distillation with context-window sparse attention to reduce quadratic complexity. Extensive experiments demonstrate robust and scalable reconstruction across irregular inputs, large viewpoint gaps, and long trajectories.

Community

Paper submitter
This comment has been hidden (marked as Spam)
Paper submitter

Arbitrary views in, consistent 3D out — AnyRecon finally makes sparse reconstruction scale.

Excited to share our latest work, AnyRecon! 🚀

Project Page: https://yutian10.github.io/AnyRecon/
Code: https://github.com/OpenImagingLab/AnyRecon

Sparse-view 3D reconstruction is essential for bringing casual captures to life, yet scaling it to complex, large-scale scenes remains a significant challenge. Many current diffusion-based methods are limited by fixed input cardinality and a lack of explicit geometric grounding.

With AnyRecon, we introduce a scalable framework designed to handle arbitrary, unordered sparse inputs. Key highlights include:

✅ Flexible Conditioning: Unlike traditional models, our framework adapts to an arbitrary number of reference views.
✅ Dual-Memory
Design: We combine explicit 3D geometry memory with implicit scene memory, coupled with geometry-aware view retrieval to select the most informative segments for generation.
✅ Efficient Architecture: By leveraging 4-step diffusion distillation and block sparse attention, we maintain high fidelity without the heavy computational overhead.

the core idea—an explicit 3d memory that grows with new views and a geometry-driven retrieval loop—feels like the missing link between diffusion synthesis and stable reconstruction. i also appreciate the 4-step diffusion distillation plus context-window sparse attention, which makes long trajectories tractable without drowning in quadratic cost. would love to see an ablation where you remove the memory or swap to purely image-based retrieval to quantify how much the memory contributes when viewpoint gaps are large. btw, the arxivlens breakdown helped me parse the method details and track where geometry memory actually sits in the pipeline.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.19747
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.19747 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.19747 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.