MatchLab
/

UniScene3D

Model card Files Files and versions

Add model card

#1

by nielsr HF Staff - opened Apr 6

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +27 -0

README.md ADDED Viewed

	@@ -0,0 +1,27 @@

+---
+pipeline_tag: image-feature-extraction
+---
+# UniScene3D: Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding
+UniScene3D is a transformer-based encoder that learns unified scene representations from multi-view colored pointmaps, jointly modeling image appearance and geometry. It extends pretrained CLIP models to learn representations that effectively combine complementary information from images and pointmaps, generalizing across diverse 3D scene understanding tasks.
+- **Project Page:** [https://yebulabula.github.io/UniScene3D/](https://yebulabula.github.io/UniScene3D/)
+- **GitHub Repository:** [https://github.com/Yebulabula/UniScene3D](https://github.com/Yebulabula/UniScene3D)
+- **Paper:** [https://huggingface.co/papers/2604.02546](https://huggingface.co/papers/2604.02546)
+## Key Features
+- **Unified Representation:** Jointly encodes geometry and appearance from multi-view colored pointmaps within a single ViT encoder.
+- **Novel Training Objectives:** Introduces cross-view geometric alignment and grounded view alignment to enforce geometric and semantic consistency.
+- **Versatile Performance:** Demonstrates state-of-the-art performance in zero-shot, few-shot, and task-specific fine-tuning settings for tasks like viewpoint grounding, scene retrieval, and 3D VQA.
+## Citation
+If you find this work useful, please cite:
+```bibtex
+@inproceedings{mao2026uniscene3d,
+  title     = {Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding},
+  author    = {Mao, Ye and Luo, Weixun and Huang, Ranran and Jing, Junpeng and Mikolajczyk, Krystian},
+  booktitle = {arxiv},
+  year      = {2026}
+}
+```