---
license: other
license_name: nvidia-oneway-noncommercial
license_link: https://huggingface.co/nvidia/dvlt/blob/main/LICENSE.txt
library_name: dvlt
pipeline_tag: image-to-3d
tags:
- 3d-reconstruction
- depth-estimation
- camera-pose
- pointmap
- looped-transformer
base_model: facebook/dinov2-base
---
Déjà View: Looping Transformers for Multi-View 3D Reconstruction

**[NVIDIA](https://www.nvidia.com/)** **[University of Modena and Reggio Emilia](https://www.unimore.it/it)** **[University of Toronto](https://www.utoronto.ca/)** **[ETH Zurich](https://ethz.ch/)**
[Alessandro Burzio*](https://research.nvidia.com/labs/dvl/author/alessandro-burzio/), [Tobias Fischer*](https://tobiasfshr.github.io/), [Sven Elflein](https://selflein.github.io/), [Qunjie Zhou](https://research.nvidia.com/labs/dvl/author/qunjie-zhou/), [Riccardo de Lutio](https://riccardodelutio.github.io/), [Jiawei Ren](https://jiawei-ren.github.io/), [Jiahui Huang](https://huangjh-pub.github.io/), [Shengyu Huang](https://shengyuh.github.io/), [Marc Pollefeys](https://people.inf.ethz.ch/marc.pollefeys/), [Laura Leal-Taixé](https://research.nvidia.com/labs/dvl/author/laura-leal-taixe/), [Zan Gojcic+](https://zgojcic.github.io/), [Haithem Turki+](https://haithemturki.com/)
# Model Overview
### Description
Déjà View Looping Transformer (DVLT) is a feed-forward three-dimensional (3D) reconstruction model that takes unposed Red, Green, Blue (RGB) images or video and predicts per-pixel depth, ray maps (and thus 3D points), and per-view camera intrinsics/extrinsics in a single pass.
**Novelty:** A weight-tied looped transformer — instead of stacking many distinct layers, a single shared block is applied for K refinement steps over a DINOv2-initialized per-view state, with each step conditioned on a continuous time interval (t_k, t_k+1) ⊂ [0, 1]. A single checkpoint exposes the iteration count K as an inference-time compute/quality knob without retraining separate models (released checkpoint valid for K ∈ [8, 16]).
This model is for research and development only.
### License/Terms of Use
**Model.** The model (checkpoints, learned weights, and configuration files) is released under the **NVIDIA License**: https://huggingface.co/nvidia/dvlt/blob/main/LICENSE.txt.
**Source code.** The accompanying source code is licensed separately. The repository is primarily licensed under the **Apache License, Version 2.0** — see the `LICENSE` file at the repository root. Portions of the codebase derived from VGGT (Meta) are distributed under the VGGT License v1; the full text is provided in `LICENSES/VGGT-LICENSE.txt`. Full third-party attribution, per-file notices, and upstream license texts are collected in `THIRD_PARTY_LICENSES.md`.
### Deployment Geography
Global
### Use Case
Primary users:
- Computer Vision Researchers: For benchmarking multi-view 3D reconstruction, studying weight-tied / recurrent transformer architectures, and developing neural rendering pipelines.
- Augmented Reality/Virtual Reality & Robotics Engineers: For real-time simultaneous localization and mapping (SLAM), scene understanding, and navigation research prototypes.
- 3D Content Creators: For rapid conversion of unposed video/image collections into 3D assets.
Primary Use Cases:
- 3D Reconstruction: Fast, feed-forward estimation of dense per-pixel depth, ray maps, and per-view camera poses from unposed images or video, without iterative optimization.
- Structure-from-motion (SfM) Replacement: Accelerating initialization for 3D Gaussian Splatting and Neural Radiance Field (NeRF) training by replacing slow SfM pipelines (e.g., COLMAP).
- Compute-Adaptive Inference: A single checkpoint supports a range of recurrence step counts (8–16) at inference, letting downstream applications trade reconstruction quality for latency without retraining.
### Release Date
GitHub 06/02/2026 via [URL TBD]