| | --- |
| | base_model: |
| | - tencent/DepthCrafter |
| | - stabilityai/stable-video-diffusion-img2vid-xt |
| | language: |
| | - en |
| | library_name: geometry-crafter |
| | license: other |
| | tags: |
| | - video-to-3d |
| | - point-cloud |
| | --- |
| | |
| | ## ___***GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors***___ |
| | <div align="center"> |
| |
|
| | _**[Tian-Xing Xu<sup>1</sup>](https://scholar.google.com/citations?user=zHp0rMIAAAAJ&hl=zh-CN), |
| | [Xiangjun Gao<sup>3</sup>](https://scholar.google.com/citations?user=qgdesEcAAAAJ&hl=en), |
| | [Wenbo Hu<sup>2 †</sup>](https://wbhu.github.io), |
| | [Xiaoyu Li<sup>2</sup>](https://xiaoyu258.github.io), |
| | [Song-Hai Zhang<sup>1 †</sup>](https://scholar.google.com/citations?user=AWtV-EQAAAAJ&hl=en), |
| | [Ying Shan<sup>2</sup>](https://scholar.google.com/citations?user=4oXBp9UAAAAJ&hl=en)**_ |
| | <br> |
| | <sup>1</sup>Tsinghua University |
| | <sup>2</sup>ARC Lab, Tencent PCG |
| | <sup>3</sup>HKUST |
| |
|
| |  |
| | <a href='https://arxiv.org/abs/2504.01016'><img src='https://img.shields.io/badge/arXiv-2504.01016-b31b1b.svg'></a> |
| | <a href='https://geometrycrafter.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> |
| | <a href='https://huggingface.co/spaces/TencentARC/GeometryCrafter'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue'></a> |
| |
|
| | </div> |
| |
|
| | ## π Notice |
| |
|
| | **GeometryCrafter is still under active development!** |
| |
|
| | We recommend that everyone use English to communicate on issues, as this helps developers from around the world discuss, share experiences, and answer questions together. For further implementation details, please contact `xutx21@mails.tsinghua.edu.cn`. For business licensing and other related inquiries, don't hesitate to contact `wbhu@tencent.com`. |
| |
|
| | If you find GeometryCrafter useful, **please help β this repo**, which is important to Open-Source projects. Thanks! |
| |
|
| | ## π Introduction |
| |
|
| | We present GeometryCrafter, a novel approach that estimates temporally consistent, high-quality point maps from open-world videos, facilitating downstream applications such as 3D/4D reconstruction and depth-based video editing or generation. This model is described in detail in the paper [GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors](https://arxiv.org/abs/2504.01016). |
| |
|
| | Release Notes: |
| | - `[01/04/2025]` π₯π₯π₯**GeometryCrafter** is released now, have fun! |
| |
|
| | ## π Quick Start |
| |
|
| | ### Installation |
| | 1. Clone this repo: |
| | ```bash |
| | git clone --recursive https://github.com/TencentARC/GeometryCrafter |
| | ``` |
| | 2. Install dependencies (please refer to [requirements.txt](requirements.txt)): |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | ### Inference |
| |
|
| | Run inference code on our provided demo videos at 1.27FPS, which requires a GPU with ~40GB memory for 110 frames with 1024x576 resolution: |
| |
|
| | ```bash |
| | python run.py \ |
| | --video_path examples/video1.mp4 \ |
| | --save_folder workspace/examples_output \ |
| | --height 576 --width 1024 |
| | # resize the input video to the target resolution for processing, which should be divided by 64 |
| | # the output point maps will be restored to the original resolution before saving |
| | # you can use --downsample_ratio to downsample the input video or reduce --decode_chunk_size to save the memory usage |
| | ``` |
| |
|
| | Run inference code with our deterministic variant at 1.50 FPS |
| |
|
| | ```bash |
| | python run.py \ |
| | --video_path examples/video1.mp4 \ |
| | --save_folder workspace/examples_output \ |
| | --height 576 --width 1024 \ |
| | --model_type determ |
| | ``` |
| |
|
| | Run low-resolution processing at 2.49 FPS, which requires a GPU with ~22GB memory: |
| |
|
| | ```bash |
| | python run.py \ |
| | --video_path examples/video1.mp4 \ |
| | --save_folder workspace/examples_output \ |
| | --height 384 --width 640 |
| | ``` |
| |
|
| | ### Visualization |
| |
|
| | Visualize the predicted point maps with `Viser` |
| |
|
| | ```bash |
| | python visualize/vis_point_maps.py \ |
| | --video_path examples/video1.mp4 \ |
| | --data_path workspace/examples_output/video1.npz |
| | ``` |
| |
|
| | ## π€ Gradio Demo |
| |
|
| | - Online demo: [**GeometryCrafter**](https://huggingface.co/spaces/TencentARC/GeometryCrafter) |
| | - Local demo: |
| | ```bash |
| | gradio app.py |
| | ``` |
| |
|
| | ## π Dataset Evaluation |
| |
|
| | Please check the `evaluation` folder. |
| | - To create the dataset we use in the paper, you need to run `evaluation/preprocess/gen_{dataset_name}.py`. |
| | - You need to change `DATA_DIR` and `OUTPUT_DIR` first accordint to your working environment. |
| | - Then you will get the preprocessed datasets containing extracted RGB video and point map npz files. We also provide the catelog of these files. |
| | - Inference for all datasets scripts: |
| | ```bash |
| | bash evaluation/run_batch.sh |
| | ``` |
| | (Remember to replace the `data_root_dir` and `save_root_dir` with your path.) |
| | - Evaluation for all datasets scripts (scale-invariant point map estimation): |
| | ```bash |
| | bash evaluation/eval.sh |
| | ``` |
| | (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.) |
| | - Evaluation for all datasets scripts (affine-invariant depth estimation): |
| | ```bash |
| | bash evaluation/eval_depth.sh |
| | ``` |
| | (Remember to replace the `pred_data_root_dir` and `gt_data_root_dir` with your path.) |
| | - We also provide the comparison results of MoGe and the deterministic variant of our method. You can evaluate these methods under the same protocol by uncomment the corresponding lines in `evaluation/run.sh` `evaluation/eval.sh` `evaluation/run_batch.sh` and `evaluation/eval_depth.sh`. |
| |
|
| | ## π€ Contributing |
| |
|
| | - Welcome to open issues and pull requests. |
| | - Welcome to optimize the inference speed and memory usage, e.g., through model quantization, distillation, or other acceleration techniques. |
| |
|
| | ## π Citation |
| |
|
| | If you find this work helpful, please consider citing: |
| |
|
| | ```bibtex |
| | @misc{xu2025geometrycrafterconsistentgeometryestimation, |
| | title={GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors}, |
| | author={Tian-Xing Xu and Xiangjun Gao and Wenbo Hu and Xiaoyu Li and Song-Hai Zhang and Ying Shan}, |
| | year={2025}, |
| | eprint={2504.01016}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.GR}, |
| | url={https://arxiv.org/abs/2504.01016}, |
| | } |
| | ``` |