UniScene3D / README.md

nielsr HF Staff

Add model card

723aa90 verified 7 days ago

1.68 kB

pipeline_tag: image-feature-extraction

UniScene3D: Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding

UniScene3D is a transformer-based encoder that learns unified scene representations from multi-view colored pointmaps, jointly modeling image appearance and geometry. It extends pretrained CLIP models to learn representations that effectively combine complementary information from images and pointmaps, generalizing across diverse 3D scene understanding tasks.

Project Page: https://yebulabula.github.io/UniScene3D/
GitHub Repository: https://github.com/Yebulabula/UniScene3D
Paper: https://huggingface.co/papers/2604.02546

Key Features

Unified Representation: Jointly encodes geometry and appearance from multi-view colored pointmaps within a single ViT encoder.
Novel Training Objectives: Introduces cross-view geometric alignment and grounded view alignment to enforce geometric and semantic consistency.
Versatile Performance: Demonstrates state-of-the-art performance in zero-shot, few-shot, and task-specific fine-tuning settings for tasks like viewpoint grounding, scene retrieval, and 3D VQA.

Citation

If you find this work useful, please cite:

@inproceedings{mao2026uniscene3d,
  title     = {Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding},
  author    = {Mao, Ye and Luo, Weixun and Huang, Ranran and Jing, Junpeng and Mikolajczyk, Krystian},
  booktitle = {arxiv},
  year      = {2026}
}