Zhengqin's picture
Super-squash branch 'main' using huggingface_hub
7f6111a
---
license: cc-by-nc-4.0
extra_gated_fields:
First Name: text
Last Name: text
Date of birth: date_picker
Country: country
Affiliation: text
Job title:
type: select
options:
- Student
- Research Graduate
- AI researcher
- AI developer/engineer
- Reporter
- Other
geo: ip_location
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: >-
The information you provide will be collected, stored, processed and shared in
accordance with the [Meta Privacy
Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: Please be sure to provide your full legal name, date of birth,
and full organization name with all corporate identifiers. Avoid the use of acronyms
and special characters. Failure to follow these instructions may prevent you from
accessing this model and others on Hugging Face. You will not have the ability to
edit this form after submission, so please ensure all information is accurate.
language:
- en
tags:
- 3D-Reconstruction
- Inverse-Rendering
- Image-To-3D
pipeline_tag: image-to-3d
---
# LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
![Teaser](teaser.png)
Official model release for the paper
**LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows**. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data.
* Project page: <https://lzqsd.github.io/LSRM.github.io/>
* arXiv: <https://arxiv.org/abs/2604.05182>
* Github: <https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git>
## Quickstart
This Hugging Face repository hosts only the **pre-trained weights and test
examples**. To run inference, clone the GitHub repo and follow its README:
```sh
git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
cd Large-Sparse-Reconstruction-Model
# then follow README.md for setup and test commands
```
## Checkpoints and testing data
This repo hosts pre-trained weights and testing examples.
### Checkpoints
Our `checkpoints/` folder contains model weights for our 3D reconstruction
and inverse rendering models, with the following layout (this is what the
test scripts expect):
```
checkpoints/
β”œβ”€β”€ rgb/ # 3D reconstruction (GSO)
β”‚ β”œβ”€β”€ dense/
β”‚ β”‚ β”œβ”€β”€ args.txt
β”‚ β”‚ └── checkpoints/last.pth
β”‚ └── sparse.pth
└── brdf/ # inverse rendering (ORB / DTC)
β”œβ”€β”€ dense/
β”‚ β”œβ”€β”€ args.txt
β”‚ └── checkpoints/last.pth
└── sparse.pth
```
### Testing data
This repo also provides one testing example from each of the GSO, DTC, and
ORB datasets, plus the default HDR environment map `env.exr`.
```
datasets/
β”œβ”€β”€ rgb/
β”‚ └── gso_example/ # GSO example (3D reconstruction)
└── brdf/
β”œβ”€β”€ orb_example/ # ORB example (inverse rendering)
└── dtc_example/ # DTC example (inverse rendering)
env.exr # default HDR environment map (repo root)
```
`env.exr` is the default HDR environment map used by Blender for re-rendering
predicted BRDFs / meshes. The test scripts pick it up automatically; you only
need to replace it if you want to re-light results under a different
illumination.
### Dataset format
To run on your own captures, mirror one of the example layouts. The two
formats are:
**RGB (GSO-style, for `test_rgb.sh`)**
Loaded by `data_loader/dtc_dataset.py`. Assumes the object is inside a
bounding sphere of radius **0.25**, which is scaled to **0.5** during data
loading (matching the `--bbox_radius 0.5` used in the test scripts).
```
datasets/rgb/<your_split>/
β”œβ”€β”€ test.txt # one scene name per line
└── <scene_name>/
β”œβ”€β”€ images/
β”‚ β”œβ”€β”€ CameraRig.json
β”‚ β”œβ”€β”€ image_process_info.json
β”‚ └── rgb/
β”‚ β”œβ”€β”€ rgb0000000.png … rgb000NNNN.png # input views
β”‚ └── mask0000000.png … mask000NNNN.png # foreground masks
└── scene/
β”œβ”€β”€ aria_trajectory.csv
└── scene_info.json
```
**BRDF (ORB / DTC-style, for `test_brdf.sh`)**
Loaded by `data_loader/real_dataset.py`. Assumes the object is inside a
bounding sphere of radius **0.5** (matching the `--bbox_radius 0.5` used in
the test scripts).
```
datasets/brdf/<your_split>/
β”œβ”€β”€ test.txt # one scene name per line
└── <scene_name>/
β”œβ”€β”€ transforms_input.json # camera poses for the input views
β”œβ”€β”€ transforms_output.json # camera poses for the eval views
β”œβ”€β”€ scale_center.txt # per-scene scene normalization
β”œβ”€β”€ input/ # input RGB
β”œβ”€β”€ mask/ # input foreground masks
β”œβ”€β”€ output/ # eval-view RGB (ground truth)
β”œβ”€β”€ mask_output/ # eval-view foreground masks
└── env/ # environment maps (per scene)
```
To use the GitHub code on your own captures, mirror one of the example
layouts and point the corresponding test script at your split by editing
`data_path` inside the `.sh` file (or by overriding it on the command line).
## License
This project is licensed under the Creative Commons Attribution-NonCommercial
4.0 International (CC BY-NC 4.0) license. See [LICENSE.md](LICENSE.md) for
details.
## Acknowledgements
Our Triton implementation of native sparse attention builds on
[lucidrains/native-sparse-attention-pytorch](https://github.com/lucidrains/native-sparse-attention-pytorch)
and draws inspiration from
[DreamTechAI/Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We thank
the authors of both projects for releasing their work.
## Citation
If you find this work useful, please cite:
```bibtex
@article{lsrm,
title = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows},
author = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong},
journal = {arXiv preprint arXiv:2604.05182},
year = {2026}
}
```