Zhengqin's picture
Super-squash branch 'main' using huggingface_hub
7f6111a
metadata
license: cc-by-nc-4.0
extra_gated_fields:
  First Name: text
  Last Name: text
  Date of birth: date_picker
  Country: country
  Affiliation: text
  Job title:
    type: select
    options:
      - Student
      - Research Graduate
      - AI researcher
      - AI developer/engineer
      - Reporter
      - Other
  geo: ip_location
  By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
extra_gated_description: >-
  The information you provide will be collected, stored, processed and shared in
  accordance with the [Meta Privacy
  Policy](https://www.facebook.com/privacy/policy/).
extra_gated_button_content: Submit
extra_gated_heading: >-
  Please be sure to provide your full legal name, date of birth, and full
  organization name with all corporate identifiers. Avoid the use of acronyms
  and special characters. Failure to follow these instructions may prevent you
  from accessing this model and others on Hugging Face. You will not have the
  ability to edit this form after submission, so please ensure all information
  is accurate.
language:
  - en
tags:
  - 3D-Reconstruction
  - Inverse-Rendering
  - Image-To-3D
pipeline_tag: image-to-3d

LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

Teaser

Official model release for the paper LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data.

Quickstart

This Hugging Face repository hosts only the pre-trained weights and test examples. To run inference, clone the GitHub repo and follow its README:

git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
cd Large-Sparse-Reconstruction-Model
# then follow README.md for setup and test commands

Checkpoints and testing data

This repo hosts pre-trained weights and testing examples.

Checkpoints

Our checkpoints/ folder contains model weights for our 3D reconstruction and inverse rendering models, with the following layout (this is what the test scripts expect):

checkpoints/
β”œβ”€β”€ rgb/                       # 3D reconstruction (GSO)
β”‚   β”œβ”€β”€ dense/
β”‚   β”‚   β”œβ”€β”€ args.txt
β”‚   β”‚   └── checkpoints/last.pth
β”‚   └── sparse.pth
└── brdf/                      # inverse rendering (ORB / DTC)
    β”œβ”€β”€ dense/
    β”‚   β”œβ”€β”€ args.txt
    β”‚   └── checkpoints/last.pth
    └── sparse.pth

Testing data

This repo also provides one testing example from each of the GSO, DTC, and ORB datasets, plus the default HDR environment map env.exr.

datasets/
β”œβ”€β”€ rgb/
β”‚   └── gso_example/           # GSO example (3D reconstruction)
└── brdf/
    β”œβ”€β”€ orb_example/           # ORB example (inverse rendering)
    └── dtc_example/           # DTC example (inverse rendering)

env.exr                       # default HDR environment map (repo root)

env.exr is the default HDR environment map used by Blender for re-rendering predicted BRDFs / meshes. The test scripts pick it up automatically; you only need to replace it if you want to re-light results under a different illumination.

Dataset format

To run on your own captures, mirror one of the example layouts. The two formats are:

RGB (GSO-style, for test_rgb.sh)

Loaded by data_loader/dtc_dataset.py. Assumes the object is inside a bounding sphere of radius 0.25, which is scaled to 0.5 during data loading (matching the --bbox_radius 0.5 used in the test scripts).

datasets/rgb/<your_split>/
β”œβ”€β”€ test.txt                     # one scene name per line
└── <scene_name>/
    β”œβ”€β”€ images/
    β”‚   β”œβ”€β”€ CameraRig.json
    β”‚   β”œβ”€β”€ image_process_info.json
    β”‚   └── rgb/
    β”‚       β”œβ”€β”€ rgb0000000.png … rgb000NNNN.png      # input views
    β”‚       └── mask0000000.png … mask000NNNN.png    # foreground masks
    └── scene/
        β”œβ”€β”€ aria_trajectory.csv
        └── scene_info.json

BRDF (ORB / DTC-style, for test_brdf.sh)

Loaded by data_loader/real_dataset.py. Assumes the object is inside a bounding sphere of radius 0.5 (matching the --bbox_radius 0.5 used in the test scripts).

datasets/brdf/<your_split>/
β”œβ”€β”€ test.txt                     # one scene name per line
└── <scene_name>/
    β”œβ”€β”€ transforms_input.json    # camera poses for the input views
    β”œβ”€β”€ transforms_output.json   # camera poses for the eval views
    β”œβ”€β”€ scale_center.txt         # per-scene scene normalization
    β”œβ”€β”€ input/                   # input RGB
    β”œβ”€β”€ mask/                    # input foreground masks
    β”œβ”€β”€ output/                  # eval-view RGB (ground truth)
    β”œβ”€β”€ mask_output/             # eval-view foreground masks
    └── env/                     # environment maps (per scene)

To use the GitHub code on your own captures, mirror one of the example layouts and point the corresponding test script at your split by editing data_path inside the .sh file (or by overriding it on the command line).

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. See LICENSE.md for details.

Acknowledgements

Our Triton implementation of native sparse attention builds on lucidrains/native-sparse-attention-pytorch and draws inspiration from DreamTechAI/Direct3D-S2. We thank the authors of both projects for releasing their work.

Citation

If you find this work useful, please cite:

@article{lsrm,
    title   = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows},
    author  = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong},
    journal = {arXiv preprint arXiv:2604.05182},
    year    = {2026}
}