arxiv:2605.25449

Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion

Published on May 25

· Submitted by

Ting-Hsuan Chen on May 26

Bosch US

Upvote

Authors:

Jie-Ying Lee ,

Abstract

Pantheon360 enables high-fidelity 360° video generation for digital twins by combining 3D-aware diffusion with explicit geometric caching to ensure spatial-temporal consistency.

AI-generated summary

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial-temporal consistency constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides a strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.

View arXiv page View PDF Project page Add to collection

Community

Koi953215

Paper submitter about 18 hours ago

Generating complete digital twins from videos requires precise camera control, global scene coverage, and strict spatial–temporal consistency—constraints that remain challenging for perspective video generators due to their limited field of view (FoV). Their narrow FoV forces long or multi-view trajectories, amplifying cross-view inconsistency and temporal drift. We argue that 360° video generation offers a natural solution: panoramic coverage simplifies trajectory design and provides strong global context for maintaining coherence. We introduce Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion, a controllable 360° video generation framework that synthesizes high-fidelity videos from sparse 360° inputs. The key idea is an explicit 3D Cache, reconstructed from the input, which serves as a geometric scaffold for any user-defined camera path. This allows the diffusion model to focus on photorealistic texture refinement while the 3D Cache enforces global geometric consistency. Beyond single-image generation, we are the first video diffusion model to support 360° interpolation, enabling seamless chaining of video segments to produce extended, coherent long-form videos. Experiments show that Pantheon360 achieves superior visual quality and unmatched geometric coherence, enabling reliable and flexible 360° scene generation for downstream simulation and digital-twin applications.

avahal

8 minutes ago

the explicit 3d cache that grounds the diffusion in a learned geometric scaffold is the standout move, it lets the model refine texture while the cache keeps global structure intact. i like the geometry-only rendering along a user trajectory followed by clip-based semantic fusion from eight 45° crops, it feels like a clean separation that helps cross-view coherence. my only worry is how the 3d cache copes with dynamic elements or drift when sparse inputs miss fast-moving objects. the arxivlens breakdown helped me parse the method details and its summary is a neat primer on where those conditioning signals land in the diffusion loop: https://arxivlens.com/PaperView/Details/pantheon360-taming-digital-twin-generation-via-3d-aware-360deg-video-diffusion-626-643f086b