SpatialScore / README.md
haoningwu's picture
Update README.md
385aa4d verified
# SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding (CVPR 2026 Highlight)
This repository contains the official PyTorch implementation of SpatialScore: https://arxiv.org/abs/2505.17012/.
Our new version paper has been accepted by CVPR 2026, and we have updated our up-to-date code and data!
Feel free to reach out for discussions!
<div align="center">
<img src="./assets/dataset.png">
</div>
Current Leaderboard (You are welcome to test your models on SpatialScore!):
<div align="center">
<img src="./assets/SpatialScore.png">
</div>
## Some Information
[Project Page](https://haoningwu3639.github.io/SpatialScore/) 路 [Paper](https://arxiv.org/abs/2505.17012/) 路 [SpatialScore_Benchmark](https://huggingface.co/datasets/haoningwu/SpatialScore) 路 [SpatialCorpus](https://huggingface.co/datasets/haoningwu/SpatialCorpus) 路 [Model](https://huggingface.co/haoningwu/SpatialScore)
## News
- [2026.5] We have updated our up-to-date code and data!
- [2026.4] Glad to share that **SpatialScore** has been accepted to **CVPR 2026** and selected as **Highlight**.
- [2025.5] ~~We have released version_0 of our evaluation code, supporting most mainstream models.~~
- [2025.5] ~~We have released version_0 of SpatialScore, which is available on [Huggingface](https://huggingface.co/datasets/haoningwu/SpatialScore).~~
- [2025.5] Our pre-print paper is released on arXiv.
## Requirements
- Python >= 3.10 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
- [PyTorch >= 2.8.0](https://pytorch.org/)
- accelerate == 1.13.0
- xformers==0.0.32.post1
- flash-attn==2.8.2
- vllm == 0.11.0
- triton == 3.4.0
- triton_kernels (please refer to [gpt_oss](https://wheels.vllm.ai/gpt-oss/triton-kernels/) for version supporting gpt_oss)
- transformers == 4.57.3
The aforementioned dependencies are necessary for conducting evaluations on SpatialScore.
If you intend to utilize SpatialAgent; since it requires invoking various spatial perception tools, you may need to consult the following repositories to install the corresponding tool dependencies, and download their corresponding pre-trained checkpoints, including [Rex-Omni](https://github.com/IDEA-Research/Rex-Omni), [Map-Anything](https://github.com/facebookresearch/map-anything), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [DetAny3D](https://github.com/OpenDriveLab/DetAny3D).
A suitable [conda](https://conda.io/) environment named `SpatialScore` can be created and activated with:
```
conda env create -f environment.yaml
conda activate SpatialScore
```
## Citation
If you use this code, model, and data for your research or project, please cite:
@inproceedings{wu2026spatialscore,
author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
title = {SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}
## TODO
- [x] Release Paper
- [x] Update the final version paper
- [x] Release version_0 SpatialScore Benchmark
- [x] Release version_0 Code of Evaluation
- [x] Release version_0 Base Code of SpatialAgent
- [x] Release our training resources SpatialCorpus and the SFT models
- [x] Update SpatialScore Benchmark
- [x] Update Code of Evaluation
- [x] Update Code of SpatialAgent
## Acknowledgements
Many thanks to the code bases from [transformers](https://github.com/huggingface/transformers), [Qwen3-VL](https://github.com/qwenlm/qwen3-vl), and [TACO](https://github.com/SalesforceAIResearch/TACO).
## Contact
If you have any questions, please feel free to contact haoningwu3639@gmail.com.