Update README.md

385aa4d verified 3 days ago

3.82 kB

	# SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding (CVPR 2026 Highlight)
	This repository contains the official PyTorch implementation of SpatialScore: https://arxiv.org/abs/2505.17012/.

	Our new version paper has been accepted by CVPR 2026, and we have updated our up-to-date code and data!
	Feel free to reach out for discussions!

	<div align="center">
	<img src="./assets/dataset.png">
	</div>

	Current Leaderboard (You are welcome to test your models on SpatialScore!):

	<div align="center">
	<img src="./assets/SpatialScore.png">
	</div>

	## Some Information
	[Project Page](https://haoningwu3639.github.io/SpatialScore/) · [Paper](https://arxiv.org/abs/2505.17012/) · [SpatialScore_Benchmark](https://huggingface.co/datasets/haoningwu/SpatialScore) · [SpatialCorpus](https://huggingface.co/datasets/haoningwu/SpatialCorpus) · [Model](https://huggingface.co/haoningwu/SpatialScore)

	## News
	- [2026.5] We have updated our up-to-date code and data!
	- [2026.4] Glad to share that SpatialScore has been accepted to CVPR 2026 and selected as Highlight.
	- [2025.5] ~~We have released version_0 of our evaluation code, supporting most mainstream models.~~
	- [2025.5] ~~We have released version_0 of SpatialScore, which is available on [Huggingface](https://huggingface.co/datasets/haoningwu/SpatialScore).~~
	- [2025.5] Our pre-print paper is released on arXiv.

	## Requirements
	- Python >= 3.10 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
	- [PyTorch >= 2.8.0](https://pytorch.org/)
	- accelerate == 1.13.0
	- xformers==0.0.32.post1
	- flash-attn==2.8.2
	- vllm == 0.11.0
	- triton == 3.4.0
	- triton_kernels (please refer to [gpt_oss](https://wheels.vllm.ai/gpt-oss/triton-kernels/) for version supporting gpt_oss)
	- transformers == 4.57.3

	The aforementioned dependencies are necessary for conducting evaluations on SpatialScore.
	If you intend to utilize SpatialAgent; since it requires invoking various spatial perception tools, you may need to consult the following repositories to install the corresponding tool dependencies, and download their corresponding pre-trained checkpoints, including [Rex-Omni](https://github.com/IDEA-Research/Rex-Omni), [Map-Anything](https://github.com/facebookresearch/map-anything), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO) and [DetAny3D](https://github.com/OpenDriveLab/DetAny3D).

	A suitable [conda](https://conda.io/) environment named `SpatialScore` can be created and activated with:

	```
	conda env create -f environment.yaml
	conda activate SpatialScore
	```

	## Citation
	If you use this code, model, and data for your research or project, please cite:

	@inproceedings{wu2026spatialscore,
	author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
	title = {SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year = {2026},
	}

	## TODO
	- [x] Release Paper
	- [x] Update the final version paper
	- [x] Release version_0 SpatialScore Benchmark
	- [x] Release version_0 Code of Evaluation
	- [x] Release version_0 Base Code of SpatialAgent
	- [x] Release our training resources SpatialCorpus and the SFT models
	- [x] Update SpatialScore Benchmark
	- [x] Update Code of Evaluation
	- [x] Update Code of SpatialAgent

	## Acknowledgements
	Many thanks to the code bases from [transformers](https://github.com/huggingface/transformers), [Qwen3-VL](https://github.com/qwenlm/qwen3-vl), and [TACO](https://github.com/SalesforceAIResearch/TACO).


	## Contact
	If you have any questions, please feel free to contact haoningwu3639@gmail.com.