Papers
arxiv:2605.25449

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Published on May 25
· Submitted by
Ting-Hsuan Chen
on May 26
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.

AI-generated summary

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.

Community

Paper submitter

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial–temporal consistency—constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Beyond single-image generation, we are the first video diffusion model to support 360° interpolation, enabling seamless chaining of video segments to produce extended, coherent long-form videos. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.

the explicit 3d cache that grounds the diffusion in a learned geometric scaffold is the standout move, it lets the model refine texture while the cache keeps global structure intact. i like the geometry-only rendering along a user trajectory followed by clip-based semantic fusion from eight 45° crops, it feels like a clean separation that helps cross-view coherence. my only worry is how the 3d cache copes with dynamic elements or drift when sparse inputs miss fast-moving objects. the arxivlens breakdown helped me parse the method details and its summary is a neat primer on where those conditioning signals land in the diffusion loop: https://arxivlens.com/PaperView/Details/pantheon360-taming-digital-twin-generation-via-3d-aware-360deg-video-diffusion-626-643f086b

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.25449
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.25449 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.25449 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.25449 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.