Super-squash branch 'main' using huggingface_hub

7f6111a about 13 hours ago

6.53 kB

	---
	license: cc-by-nc-4.0
	extra_gated_fields:
	First Name: text
	Last Name: text
	Date of birth: date_picker
	Country: country
	Affiliation: text
	Job title:
	type: select
	options:
	- Student
	- Research Graduate
	- AI researcher
	- AI developer/engineer
	- Reporter
	- Other
	geo: ip_location
	By clicking Submit below I accept the terms of the license and acknowledge that the information I provide will be collected stored processed and shared in accordance with the Meta Privacy Policy: checkbox
	extra_gated_description: >-
	The information you provide will be collected, stored, processed and shared in
	accordance with the [Meta Privacy
	Policy](https://www.facebook.com/privacy/policy/).
	extra_gated_button_content: Submit
	extra_gated_heading: Please be sure to provide your full legal name, date of birth,
	and full organization name with all corporate identifiers. Avoid the use of acronyms
	and special characters. Failure to follow these instructions may prevent you from
	accessing this model and others on Hugging Face. You will not have the ability to
	edit this form after submission, so please ensure all information is accurate.
	language:
	- en
	tags:
	- 3D-Reconstruction
	- Inverse-Rendering
	- Image-To-3D
	pipeline_tag: image-to-3d
	---

	# LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows

	![Teaser](teaser.png)

	Official model release for the paper
	LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows. LSRM is a feed-forward, object-centric 3D reconstruction and inverse rendering model that reconstructs high-fidelity 3D assets from posed sparse multi-view images. The model is trained on synthetic data but comprehensively tested on real data.

	* Project page: <https://lzqsd.github.io/LSRM.github.io/>
	* arXiv: <https://arxiv.org/abs/2604.05182>
	* Github: <https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git>

	## Quickstart

	This Hugging Face repository hosts only the **pre-trained weights and test
	examples**. To run inference, clone the GitHub repo and follow its README:

	```sh
	git clone https://github.com/facebookresearch/Large-Sparse-Reconstruction-Model.git
	cd Large-Sparse-Reconstruction-Model
	# then follow README.md for setup and test commands
	```

	## Checkpoints and testing data

	This repo hosts pre-trained weights and testing examples.

	### Checkpoints

	Our `checkpoints/` folder contains model weights for our 3D reconstruction
	and inverse rendering models, with the following layout (this is what the
	test scripts expect):

	```
	checkpoints/
	├── rgb/ # 3D reconstruction (GSO)
	│ ├── dense/
	│ │ ├── args.txt
	│ │ └── checkpoints/last.pth
	│ └── sparse.pth
	└── brdf/ # inverse rendering (ORB / DTC)
	├── dense/
	│ ├── args.txt
	│ └── checkpoints/last.pth
	└── sparse.pth
	```

	### Testing data

	This repo also provides one testing example from each of the GSO, DTC, and
	ORB datasets, plus the default HDR environment map `env.exr`.

	```
	datasets/
	├── rgb/
	│ └── gso_example/ # GSO example (3D reconstruction)
	└── brdf/
	├── orb_example/ # ORB example (inverse rendering)
	└── dtc_example/ # DTC example (inverse rendering)

	env.exr # default HDR environment map (repo root)
	```

	`env.exr` is the default HDR environment map used by Blender for re-rendering
	predicted BRDFs / meshes. The test scripts pick it up automatically; you only
	need to replace it if you want to re-light results under a different
	illumination.

	### Dataset format

	To run on your own captures, mirror one of the example layouts. The two
	formats are:

	RGB (GSO-style, for `test_rgb.sh`)

	Loaded by `data_loader/dtc_dataset.py`. Assumes the object is inside a
	bounding sphere of radius 0.25, which is scaled to 0.5 during data
	loading (matching the `--bbox_radius 0.5` used in the test scripts).

	```
	datasets/rgb/<your_split>/
	├── test.txt # one scene name per line
	└── <scene_name>/
	├── images/
	│ ├── CameraRig.json
	│ ├── image_process_info.json
	│ └── rgb/
	│ ├── rgb0000000.png … rgb000NNNN.png # input views
	│ └── mask0000000.png … mask000NNNN.png # foreground masks
	└── scene/
	├── aria_trajectory.csv
	└── scene_info.json
	```

	BRDF (ORB / DTC-style, for `test_brdf.sh`)

	Loaded by `data_loader/real_dataset.py`. Assumes the object is inside a
	bounding sphere of radius 0.5 (matching the `--bbox_radius 0.5` used in
	the test scripts).

	```
	datasets/brdf/<your_split>/
	├── test.txt # one scene name per line
	└── <scene_name>/
	├── transforms_input.json # camera poses for the input views
	├── transforms_output.json # camera poses for the eval views
	├── scale_center.txt # per-scene scene normalization
	├── input/ # input RGB
	├── mask/ # input foreground masks
	├── output/ # eval-view RGB (ground truth)
	├── mask_output/ # eval-view foreground masks
	└── env/ # environment maps (per scene)
	```

	To use the GitHub code on your own captures, mirror one of the example
	layouts and point the corresponding test script at your split by editing
	`data_path` inside the `.sh` file (or by overriding it on the command line).

	## License

	This project is licensed under the Creative Commons Attribution-NonCommercial
	4.0 International (CC BY-NC 4.0) license. See [LICENSE.md](LICENSE.md) for
	details.

	## Acknowledgements

	Our Triton implementation of native sparse attention builds on
	[lucidrains/native-sparse-attention-pytorch](https://github.com/lucidrains/native-sparse-attention-pytorch)
	and draws inspiration from
	[DreamTechAI/Direct3D-S2](https://github.com/DreamTechAI/Direct3D-S2). We thank
	the authors of both projects for releasing their work.

	## Citation

	If you find this work useful, please cite:

	```bibtex
	@article{lsrm,
	title = {LSRM: High-Fidelity Object-Centric Reconstruction via Scaled Context Windows},
	author = {Zhengqin Li and Cheng Zhang and Jakob Engel and Zhao Dong},
	journal = {arXiv preprint arXiv:2604.05182},
	year = {2026}
	}
	```