license: cc-by-nc-4.0
extra_gated_fields:
First Name: text
Last Name: text
Date of birth: date_picker
Country: country
Affiliation: text
Job title:
type: select
options:
- Student
- Research Graduate
- AI researcher
- AI developer/engineer
- Reporter
- Other
geo: ip_location
By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: >-
The information you provide will be collected, stored, processed and shared in
accordance with the [Meta Privacy
Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: >-
Please be sure to provide your full legal name, date of birth, and full
organization name with all corporate identifiers. Avoid the use of acronyms
and special characters. Failure to follow these instructions may prevent you
from accessing this model and others on Hugging Face. You will not have the
ability to edit this form after submission, so please ensure all information
is accurate.
language:
- en
tags:
- 3D-Reconstruction
- Inverse-Rendering
- Image-To-3D
pipeline_tag: image-to-3d
LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows
Official model release for the paper LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data.
- Project page: https://lzqsd.github.io/LSRM.github.io/
- arXiv: https://arxiv.org/abs/2604.05182
- Github: https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
Quickstart
This Hugging Face repository hosts only the pre-trained weights and test examples. To run inference, clone the GitHub repo and follow its README:
git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
cd Large-Sparse-Reconstruction-Model
# then follow README.md for setup and test commands
Checkpoints and testing data
This repo hosts pre-trained weights and testing examples.
Checkpoints
Our checkpoints/ folder contains model weights for our 3D reconstruction
and inverse rendering models, with the following layout (this is what the
test scripts expect):
checkpoints/
βββ rgb/ # 3D reconstruction (GSO)
β βββ dense/
β β βββ args.txt
β β βββ checkpoints/last.pth
β βββ sparse.pth
βββ brdf/ # inverse rendering (ORB / DTC)
βββ dense/
β βββ args.txt
β βββ checkpoints/last.pth
βββ sparse.pth
Testing data
This repo also provides one testing example from each of the GSO, DTC, and
ORB datasets, plus the default HDR environment map env.exr.
datasets/
βββ rgb/
β βββ gso_example/ # GSO example (3D reconstruction)
βββ brdf/
βββ orb_example/ # ORB example (inverse rendering)
βββ dtc_example/ # DTC example (inverse rendering)
env.exr # default HDR environment map (repo root)
env.exr is the default HDR environment map used by Blender for re-rendering
predicted BRDFs / meshes. The test scripts pick it up automatically; you only
need to replace it if you want to re-light results under a different
illumination.
Dataset format
To run on your own captures, mirror one of the example layouts. The two formats are:
RGB (GSO-style, for test_rgb.sh)
Loaded by data_loader/dtc_dataset.py. Assumes the object is inside a
bounding sphere of radius 0.25, which is scaled to 0.5 during data
loading (matching the --bbox_radius 0.5 used in the test scripts).
datasets/rgb/<your_split>/
βββ test.txt # one scene name per line
βββ <scene_name>/
βββ images/
β βββ CameraRig.json
β βββ image_process_info.json
β βββ rgb/
β βββ rgb0000000.png β¦ rgb000NNNN.png # input views
β βββ mask0000000.png β¦ mask000NNNN.png # foreground masks
βββ scene/
βββ aria_trajectory.csv
βββ scene_info.json
BRDF (ORB / DTC-style, for test_brdf.sh)
Loaded by data_loader/real_dataset.py. Assumes the object is inside a
bounding sphere of radius 0.5 (matching the --bbox_radius 0.5 used in
the test scripts).
datasets/brdf/<your_split>/
βββ test.txt # one scene name per line
βββ <scene_name>/
βββ transforms_input.json # camera poses for the input views
βββ transforms_output.json # camera poses for the eval views
βββ scale_center.txt # per-scene scene normalization
βββ input/ # input RGB
βββ mask/ # input foreground masks
βββ output/ # eval-view RGB (ground truth)
βββ mask_output/ # eval-view foreground masks
βββ env/ # environment maps (per scene)
To use the GitHub code on your own captures, mirror one of the example
layouts and point the corresponding test script at your split by editing
data_path inside the .sh file (or by overriding it on the command line).
License
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See LICENSE.md for details.
Acknowledgements
Our Triton implementation of native sparse attention builds on lucidrains/native-sparse-attention-pytorch and draws inspiration from DreamTechAI/Direct3D-S2. We thank the authors of both projects for releasing their work.
Citation
If you find this work useful, please cite:
@article{lsrm,
title = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows},
author = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong},
journal = {arXiv preprint arXiv:2604.05182},
year = {2026}
}
