# SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding (CVPR 2026 Highlight) This repository contains the official PyTorch implementation of SpatialScore: https://arxiv.org/abs/2505.17012/. Our new version paper has been accepted by CVPR 2026, and we have updated our up-to-date code and data! Feel free to reach out for discussions!
Current Leaderboard (You are welcome to test your models on SpatialScore!):
## Some Information [Project Page](https://haoningwu3639.github.io/SpatialScore/) · [Paper](https://arxiv.org/abs/2505.17012/) · [SpatialScore_Benchmark](https://huggingface.co/datasets/haoningwu/SpatialScore) · [SpatialCorpus](https://huggingface.co/datasets/haoningwu/SpatialCorpus) · [Model](https://huggingface.co/haoningwu/SpatialScore) ## News - [2026.5] We have updated our up-to-date code and data! - [2026.4] Glad to share that **SpatialScore** has been accepted to **CVPR 2026** and selected as **Highlight**. - [2025.5] ~~We have released version_0 of our evaluation code, supporting most mainstream models.~~ - [2025.5] ~~We have released version_0 of SpatialScore, which is available on [Huggingface](https://huggingface.co/datasets/haoningwu/SpatialScore).~~ - [2025.5] Our pre-print paper is released on arXiv. ## Requirements - Python >= 3.10 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html)) - [PyTorch >= 2.8.0](https://pytorch.org/) - accelerate == 1.13.0 - xformers==0.0.32.post1 - flash-attn==2.8.2 - vllm == 0.11.0 - triton == 3.4.0 - triton_kernels (please refer to [gpt_oss](https://wheels.vllm.ai/gpt-oss/triton-kernels/) for version supporting gpt_oss) - transformers == 4.57.3 The aforementioned dependencies are necessary for conducting evaluations on SpatialScore. If you intend to utilize SpatialAgent; since it requires invoking various spatial perception tools, you may need to consult the following repositories to install the corresponding tool dependencies, and download their corresponding pre-trained checkpoints, including [Rex-Omni](https://github.com/IDEA-Research/Rex-Omni), [Map-Anything](https://github.com/facebookresearch/map-anything), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [DetAny3D](https://github.com/OpenDriveLab/DetAny3D). A suitable [conda](https://conda.io/) environment named `SpatialScore` can be created and activated with: ``` conda env create -f environment.yaml conda activate SpatialScore ``` ## Citation If you use this code, model, and data for your research or project, please cite: @inproceedings{wu2026spatialscore, author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi}, title = {SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2026}, } ## TODO - [x] Release Paper - [x] Update the final version paper - [x] Release version_0 SpatialScore Benchmark - [x] Release version_0 Code of Evaluation - [x] Release version_0 Base Code of SpatialAgent - [x] Release our training resources SpatialCorpus and the SFT models - [x] Update SpatialScore Benchmark - [x] Update Code of Evaluation - [x] Update Code of SpatialAgent ## Acknowledgements Many thanks to the code bases from [transformers](https://github.com/huggingface/transformers), [Qwen3-VL](https://github.com/qwenlm/qwen3-vl), and [TACO](https://github.com/SalesforceAIResearch/TACO). ## Contact If you have any questions, please feel free to contact haoningwu3639@gmail.com.