Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.

The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

Teaser

Official model release for the paper LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data.

Quickstart

This Hugging Face repository hosts only the pre-trained weights and test examples. To run inference, clone the GitHub repo and follow its README:

git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
cd Large-Sparse-Reconstruction-Model
# then follow README.md for setup and test commands

Checkpoints and testing data

This repo hosts pre-trained weights and testing examples.

Checkpoints

Our checkpoints/ folder contains model weights for our 3D reconstruction and inverse rendering models, with the following layout (this is what the test scripts expect):

checkpoints/
β”œβ”€β”€ rgb/                       # 3D reconstruction (GSO)
β”‚   β”œβ”€β”€ dense/
β”‚   β”‚   β”œβ”€β”€ args.txt
β”‚   β”‚   └── checkpoints/last.pth
β”‚   └── sparse.pth
└── brdf/                      # inverse rendering (ORB / DTC)
    β”œβ”€β”€ dense/
    β”‚   β”œβ”€β”€ args.txt
    β”‚   └── checkpoints/last.pth
    └── sparse.pth

Testing data

This repo also provides one testing example from each of the GSO, DTC, and ORB datasets, plus the default HDR environment map env.exr.

datasets/
β”œβ”€β”€ rgb/
β”‚   └── gso_example/           # GSO example (3D reconstruction)
└── brdf/
    β”œβ”€β”€ orb_example/           # ORB example (inverse rendering)
    └── dtc_example/           # DTC example (inverse rendering)

env.exr                       # default HDR environment map (repo root)

env.exr is the default HDR environment map used by Blender for re-rendering predicted BRDFs / meshes. The test scripts pick it up automatically; you only need to replace it if you want to re-light results under a different illumination.

Dataset format

To run on your own captures, mirror one of the example layouts. The two formats are:

RGB (GSO-style, for test_rgb.sh)

Loaded by data_loader/dtc_dataset.py. Assumes the object is inside a bounding sphere of radius 0.25, which is scaled to 0.5 during data loading (matching the --bbox_radius 0.5 used in the test scripts).

datasets/rgb/<your_split>/
β”œβ”€β”€ test.txt                     # one scene name per line
└── <scene_name>/
    β”œβ”€β”€ images/
    β”‚   β”œβ”€β”€ CameraRig.json
    β”‚   β”œβ”€β”€ image_process_info.json
    β”‚   └── rgb/
    β”‚       β”œβ”€β”€ rgb0000000.png … rgb000NNNN.png      # input views
    β”‚       └── mask0000000.png … mask000NNNN.png    # foreground masks
    └── scene/
        β”œβ”€β”€ aria_trajectory.csv
        └── scene_info.json

BRDF (ORB / DTC-style, for test_brdf.sh)

Loaded by data_loader/real_dataset.py. Assumes the object is inside a bounding sphere of radius 0.5 (matching the --bbox_radius 0.5 used in the test scripts).

datasets/brdf/<your_split>/
β”œβ”€β”€ test.txt                     # one scene name per line
└── <scene_name>/
    β”œβ”€β”€ transforms_input.json    # camera poses for the input views
    β”œβ”€β”€ transforms_output.json   # camera poses for the eval views
    β”œβ”€β”€ scale_center.txt         # per-scene scene normalization
    β”œβ”€β”€ input/                   # input RGB
    β”œβ”€β”€ mask/                    # input foreground masks
    β”œβ”€β”€ output/                  # eval-view RGB (ground truth)
    β”œβ”€β”€ mask_output/             # eval-view foreground masks
    └── env/                     # environment maps (per scene)

To use the GitHub code on your own captures, mirror one of the example layouts and point the corresponding test script at your split by editing data_path inside the .sh file (or by overriding it on the command line).

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See LICENSE.md for details.

Acknowledgements

Our Triton implementation of native sparse attention builds on lucidrains/native-sparse-attention-pytorch and draws inspiration from DreamTechAI/Direct3D-S2. We thank the authors of both projects for releasing their work.

Citation

If you find this work useful, please cite:

@article{lsrm,
    title   = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows},
    author  = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong},
    journal = {arXiv preprint arXiv:2604.05182},
    year    = {2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for facebook/Large-Sparse-Reconstruction-Model