Add model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +27 -0
README.md ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-feature-extraction
3
+ ---
4
+
5
+ # UniScene3D: Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding
6
+
7
+ UniScene3D is a transformer-based encoder that learns unified scene representations from multi-view colored pointmaps, jointly modeling image appearance and geometry. It extends pretrained CLIP models to learn representations that effectively combine complementary information from images and pointmaps, generalizing across diverse 3D scene understanding tasks.
8
+
9
+ - **Project Page:** [https://yebulabula.github.io/UniScene3D/](https://yebulabula.github.io/UniScene3D/)
10
+ - **GitHub Repository:** [https://github.com/Yebulabula/UniScene3D](https://github.com/Yebulabula/UniScene3D)
11
+ - **Paper:** [https://huggingface.co/papers/2604.02546](https://huggingface.co/papers/2604.02546)
12
+
13
+ ## Key Features
14
+ - **Unified Representation:** Jointly encodes geometry and appearance from multi-view colored pointmaps within a single ViT encoder.
15
+ - **Novel Training Objectives:** Introduces cross-view geometric alignment and grounded view alignment to enforce geometric and semantic consistency.
16
+ - **Versatile Performance:** Demonstrates state-of-the-art performance in zero-shot, few-shot, and task-specific fine-tuning settings for tasks like viewpoint grounding, scene retrieval, and 3D VQA.
17
+
18
+ ## Citation
19
+ If you find this work useful, please cite:
20
+ ```bibtex
21
+ @inproceedings{mao2026uniscene3d,
22
+ title = {Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding},
23
+ author = {Mao, Ye and Luo, Weixun and Huang, Ranran and Jing, Junpeng and Mikolajczyk, Krystian},
24
+ booktitle = {arxiv},
25
+ year = {2026}
26
+ }
27
+ ```