| --- |
| license: cc-by-nc-4.0 |
| extra_gated_fields: |
| First Name: text |
| Last Name: text |
| Date of birth: date_picker |
| Country: country |
| Affiliation: text |
| Job title: |
| type: select |
| options: |
| - Student |
| - Research Graduate |
| - AI researcher |
| - AI developer/engineer |
| - Reporter |
| - Other |
| geo: ip_location |
| By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox |
| extra_gated_description: >- |
| The information you provide will be collected, stored, processed and shared in |
| accordance with the [Meta Privacy |
| Policy](https://www.facebook.com/privacy/policy/). |
| extra_gated_button_content: Submit |
| extra_gated_heading: Please be sure to provide your full legal name, date of birth, |
| and full organization name with all corporate identifiers. Avoid the use of acronyms |
| and special characters. Failure to follow these instructions may prevent you from |
| accessing this model and others on Hugging Face. You will not have the ability to |
| edit this form after submission, so please ensure all information is accurate. |
| language: |
| - en |
| tags: |
| - 3D-Reconstruction |
| - Inverse-Rendering |
| - Image-To-3D |
| pipeline_tag: image-to-3d |
| --- |
| |
| # LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows |
|
|
|  |
|
|
| Official model release for the paper |
| **LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows**. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data. |
|
|
| * Project page: <https://lzqsd.github.io/LSRM.github.io/> |
| * arXiv: <https://arxiv.org/abs/2604.05182> |
| * Github: <https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git> |
|
|
| ## Quickstart |
|
|
| This Hugging Face repository hosts only the **pre-trained weights and test |
| examples**. To run inference, clone the GitHub repo and follow its README: |
|
|
| ```sh |
| git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git |
| cd Large-Sparse-Reconstruction-Model |
| # then follow README.md for setup and test commands |
| ``` |
|
|
| ## Checkpoints and testing data |
|
|
| This repo hosts pre-trained weights and testing examples. |
|
|
| ### Checkpoints |
|
|
| Our `checkpoints/` folder contains model weights for our 3D reconstruction |
| and inverse rendering models, with the following layout (this is what the |
| test scripts expect): |
|
|
| ``` |
| checkpoints/ |
| βββ rgb/ # 3D reconstruction (GSO) |
| β βββ dense/ |
| β β βββ args.txt |
| β β βββ checkpoints/last.pth |
| β βββ sparse.pth |
| βββ brdf/ # inverse rendering (ORB / DTC) |
| βββ dense/ |
| β βββ args.txt |
| β βββ checkpoints/last.pth |
| βββ sparse.pth |
| ``` |
|
|
| ### Testing data |
|
|
| This repo also provides one testing example from each of the GSO, DTC, and |
| ORB datasets, plus the default HDR environment map `env.exr`. |
|
|
| ``` |
| datasets/ |
| βββ rgb/ |
| β βββ gso_example/ # GSO example (3D reconstruction) |
| βββ brdf/ |
| βββ orb_example/ # ORB example (inverse rendering) |
| βββ dtc_example/ # DTC example (inverse rendering) |
| |
| env.exr # default HDR environment map (repo root) |
| ``` |
|
|
| `env.exr` is the default HDR environment map used by Blender for re-rendering |
| predicted BRDFs / meshes. The test scripts pick it up automatically; you only |
| need to replace it if you want to re-light results under a different |
| illumination. |
|
|
| ### Dataset format |
|
|
| To run on your own captures, mirror one of the example layouts. The two |
| formats are: |
|
|
| **RGB (GSO-style, for `test_rgb.sh`)** |
| |
| Loaded by `data_loader/dtc_dataset.py`. Assumes the object is inside a |
| bounding sphere of radius **0.25**, which is scaled to **0.5** during data |
| loading (matching the `--bbox_radius 0.5` used in the test scripts). |
| |
| ``` |
| datasets/rgb/<your_split>/ |
| βββ test.txt # one scene name per line |
| βββ <scene_name>/ |
| βββ images/ |
| β βββ CameraRig.json |
| β βββ image_process_info.json |
| β βββ rgb/ |
| β βββ rgb0000000.png β¦ rgb000NNNN.png # input views |
| β βββ mask0000000.png β¦ mask000NNNN.png # foreground masks |
| βββ scene/ |
| βββ aria_trajectory.csv |
| βββ scene_info.json |
| ``` |
| |
| **BRDF (ORB / DTC-style, for `test_brdf.sh`)** |
|
|
| Loaded by `data_loader/real_dataset.py`. Assumes the object is inside a |
| bounding sphere of radius **0.5** (matching the `--bbox_radius 0.5` used in |
| the test scripts). |
|
|
| ``` |
| datasets/brdf/<your_split>/ |
| βββ test.txt # one scene name per line |
| βββ <scene_name>/ |
| βββ transforms_input.json # camera poses for the input views |
| βββ transforms_output.json # camera poses for the eval views |
| βββ scale_center.txt # per-scene scene normalization |
| βββ input/ # input RGB |
| βββ mask/ # input foreground masks |
| βββ output/ # eval-view RGB (ground truth) |
| βββ mask_output/ # eval-view foreground masks |
| βββ env/ # environment maps (per scene) |
| ``` |
|
|
| To use the GitHub code on your own captures, mirror one of the example |
| layouts and point the corresponding test script at your split by editing |
| `data_path` inside the `.sh` file (or by overriding it on the command line). |
|
|
| ## License |
|
|
| This project is licensed under the Creative Commons Attribution-NonCommercial |
| 4.0 International (CC BY-NC 4.0) license. See [LICENSE.md](LICENSE.md) for |
| details. |
|
|
| ## Acknowledgements |
|
|
| Our Triton implementation of native sparse attention builds on |
| [lucidrains/native-sparse-attention-pytorch](https://github.com/lucidrains/native-sparse-attention-pytorch) |
| and draws inspiration from |
| [DreamTechAI/Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We thank |
| the authors of both projects for releasing their work. |
|
|
| ## Citation |
|
|
| If you find this work useful, please cite: |
|
|
| ```bibtex |
| @article{lsrm, |
| title = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows}, |
| author = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong}, |
| journal = {arXiv preprint arXiv:2604.05182}, |
| year = {2026} |
| } |
| ``` |
|
|