Instructions to use VAST-AI/GeoSAM2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sam2
How to use VAST-AI/GeoSAM2 with sam2:
# Use SAM2 with images import torch from sam2.sam2_image_predictor import SAM2ImagePredictor predictor = SAM2ImagePredictor.from_pretrained(VAST-AI/GeoSAM2) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): predictor.set_image(<your_image>) masks, _, _ = predictor.predict(<input_prompts>)# Use SAM2 with videos import torch from sam2.sam2_video_predictor import SAM2VideoPredictor predictor = SAM2VideoPredictor.from_pretrained(VAST-AI/GeoSAM2) with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16): state = predictor.init_state(<your_video>) # add new prompts and instantly get the output on the same frame frame_idx, object_ids, masks = predictor.add_new_points(state, <your_prompts>): # propagate the prompts to get masklets throughout the video for frame_idx, object_ids, masks in predictor.propagate_in_video(state): ... - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: pytorch | |
| pipeline_tag: mask-generation | |
| tags: | |
| - 3d | |
| - mesh | |
| - 3d-part-segmentation | |
| - sam2 | |
| - segmentation | |
| - point-cloud | |
| - geosam2 | |
| base_model: facebook/sam2.1-hiera-base-plus | |
| language: | |
| - en | |
| # GeoSAM2 | |
| > Unleashing the Power of SAM2 for 3D Part Segmentation · CVPR 2026 | |
| <div align="center"> | |
| [](https://detailgen3d.github.io/GeoSAM2/) | |
| [](https://arxiv.org/abs/2508.14036) | |
| [](https://github.com/VAST-AI-Research/GeoSAM2) | |
| [](https://www.apache.org/licenses/LICENSE-2.0) | |
| </div> | |
| GeoSAM2 lifts [SAM2](https://github.com/facebookresearch/sam2) from images to | |
| 3D meshes. Given a multi-view rendering of a mesh and an interactive prompt | |
| (a single 2D click or a 2D mask) on one view, it propagates a consistent | |
| segmentation across all views and back-projects the result to per-face 3D | |
| part labels. | |
| This repository hosts the **pretrained inference checkpoint** (`geosam2.pt`). | |
| Code, configs, and a small multi-view demo dataset live in the companion | |
| GitHub repo: <https://github.com/VAST-AI-Research/GeoSAM2>. | |
| ## Model summary | |
| | | | | |
| |---|---| | |
| | Task | Interactive 3D part segmentation on meshes via multi-view 2D propagation | | |
| | Base model | [`facebook/sam2.1-hiera-base-plus`](https://huggingface.co/facebook/sam2.1-hiera-base-plus) | | |
| | Architecture | SAM2 (Hiera-B+ image encoder + memory attention + mask decoder), plus a dedicated **position-map encoder** for 3D geometry, **feature fusion**, and **LoRA adapters** on the image and position-map encoders | | |
| | Parameters | ~154 M (fp32: ~588 MB · bf16: ~294 MB) | | |
| | Input modalities | 12 rendered views per mesh: color (`.webp`), depth (`.exr`), normal (`.webp`), camera metadata (`meta.json`) | | |
| | Prompts | 2D point clicks or a 2D mask on any view | | |
| | Output | Per-view 2D label maps and per-face 3D labels for the input mesh | | |
| | Render config | 12 azimuthally-spaced views at 1024×1024 from a fixed elevation | | |
| ## Quickstart | |
| ```bash | |
| # 1. Clone the code | |
| git clone https://github.com/VAST-AI-Research/GeoSAM2.git | |
| cd GeoSAM2 | |
| pip install -r requirements.txt | |
| pip install -e . # builds the optional CUDA op; set GEOSAM2_BUILD_CUDA=0 to skip | |
| # 2. Download the checkpoint into ./ckpt | |
| hf download VAST-AI/GeoSAM2 geosam2.pt --local-dir ckpt | |
| # 3. Run the bundled demo (single-view point prompt -> 3D segmentation) | |
| bash scripts/run_example.sh | |
| ``` | |
| Direct inference from a 2D mask: | |
| ```bash | |
| python inference.py \ | |
| --data-root example/sample_00 \ | |
| --mask-path outputs/sample_00/2d_seg/mask_view0000.npy \ | |
| --mask-view 0 \ | |
| --postprocess-pa 0.02 \ | |
| --output-dir outputs/sample_00/3d_seg | |
| ``` | |
| See the [README](https://github.com/VAST-AI-Research/GeoSAM2#readme) for the | |
| full usage guide, including rendering your own meshes with Blender. | |
| ## Files | |
| | File | Size | Description | | |
| |---|---|---| | |
| | `geosam2.pt` | ~588 MB | Pretrained weights in `float32` (`{"model": state_dict}`). Default choice. | | |
| | `geosam2-bf16.pt` | ~294 MB | Same weights cast to `bfloat16` for faster downloads / lower memory. Loaded by the standard code path — `load_state_dict` upcasts to the model dtype, so no extra steps are required. Expect a small reconstruction error of ≤ 0.015 per weight versus the fp32 file. | | |
| Both checkpoints are loaded by | |
| `sam2.build_sam.build_sam2_video_predictor_geosam2` with the Hydra config | |
| `sam2/configs/geosam2.yaml`. Pass the chosen file via `--sam2-checkpoint` | |
| (or use the default `ckpt/geosam2.pt` path expected by the scripts). | |
| ## Intended use | |
| - **Intended**: interactive 3D part segmentation of single-object meshes for | |
| research and content-creation tooling. | |
| - **Out of scope**: scene-level segmentation, dynamic scenes, semantic | |
| category prediction (the model produces instance-level part masks, not | |
| semantic class labels), and safety-critical applications. | |
| ## Limitations | |
| - Expects the 12-view rendering convention produced by `geosam2_render.py`; | |
| arbitrary view counts or camera trajectories may degrade quality. | |
| - The mesh must fit within the normalised cube used at render time | |
| (`geosam2_render.py` handles this for the bundled samples). | |
| - Performance on thin/wire-like geometry and on highly transparent surfaces | |
| is still an open problem. | |
| - The post-processing `--postprocess-pa` value sometimes needs hand-tuning | |
| per mesh (`0.01`, `0.02`, `0.035` are useful starting points). | |
| ## License | |
| Released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). | |
| The checkpoint is a derivative of Meta's | |
| [SAM2](https://github.com/facebookresearch/sam2) (Apache 2.0); see the | |
| [`NOTICE`](https://github.com/VAST-AI-Research/GeoSAM2/blob/main/NOTICE) | |
| file in the code repo for attribution. | |
| ## Citation | |
| ```bibtex | |
| @article{deng2025geosam2, | |
| title = {GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation}, | |
| author = {Deng, Ken and Yang, Yunhan and Sun, Jingxiang and | |
| Liu, Xihui and Liu, Yebin and Liang, Ding and Cao, Yan-Pei}, | |
| journal = {arXiv preprint arXiv:2508.14036}, | |
| year = {2025} | |
| } | |
| ``` | |