Update README.md
Browse files
README.md
CHANGED
|
@@ -45,99 +45,8 @@ conda env create -f environment.yaml
|
|
| 45 |
conda activate SpatialScore
|
| 46 |
```
|
| 47 |
|
| 48 |
-
## Dataset
|
| 49 |
-
Please check out [SpaitalScore](https://huggingface.co/datasets/haoningwu/SpatialScore) to download our proposed benchmark (`SpatialScore`).
|
| 50 |
-
|
| 51 |
-
If you cannot access Huggingface, you can use [hf-mirror](https://hf-mirror.com/) to download models.
|
| 52 |
-
|
| 53 |
-
```
|
| 54 |
-
export HF_ENDPOINT=https://hf-mirror.com # Add this before huggingface-cli download
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
You can follow the commands below to prepare the data:
|
| 58 |
-
|
| 59 |
-
```
|
| 60 |
-
huggingface-cli download --resume-download --repo-type dataset haoningwu/SpatialScore --local-dir ./ --local-dir-use-symlinks False
|
| 61 |
-
unzip SpatialScore_benchmark.zip
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
## Evaluation
|
| 65 |
-
Considering the current mainstream model architectures, we have prioritized support for the Qwen2.5-VL and Qwen3-VL series models.
|
| 66 |
-
You can evaluate them on SpatialScore using the following commands:
|
| 67 |
-
|
| 68 |
-
```
|
| 69 |
-
CUDA_VISIBLE_DEVICES=0,1 python test_qwen.py --model_name qwen3vl-4b --model_path ./huggingface/Qwen3-VL-4B-Instruct --dataset_json_path ./SpatialScore_benchmark/SpatialScore_benchmark.ndjson --output_dir ./eval_results
|
| 70 |
-
```
|
| 71 |
-
|
| 72 |
-
Now, the All-in-one script supporting all other models is also available.
|
| 73 |
-
You can evaluate other models on SpatialScore using the following commands:
|
| 74 |
-
|
| 75 |
-
```
|
| 76 |
-
CUDA_VISIBLE_DEVICES=0,1 python test_all_in_one.py --model_name llava-ov-7b --model_path ../huggingface/LLaVA-OneVision-7B --dataset_json_path ./SpatialScore_benchmark/SpatialScore_benchmark.ndjson --output_dir ./eval_results
|
| 77 |
-
```
|
| 78 |
-
|
| 79 |
-
Our final evaluation encompassed rule-based evaluation and LLM-based answer extraction, which are combined to calculate the final accuracy.
|
| 80 |
-
Therefore, you need to configure [GPT-OSS](https://github.com/openai/gpt-oss) and download the corresponding [GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b) checkpoint before running the following script to compute the final score:
|
| 81 |
-
|
| 82 |
-
```
|
| 83 |
-
MKL_THREADING_LAYER=GNU CUDA_VISIBLE_DEVICES=0 python ./evaluate_results.py --input ./eval_results/qwen3vl-4b
|
| 84 |
-
```
|
| 85 |
-
|
| 86 |
-
## Inference with SpatialAgent
|
| 87 |
-
Before using SpatialAgent, you need to install the additional dependencies required by the toolbox according to the Requirements section.
|
| 88 |
-
|
| 89 |
-
In addition, you should download the checkpoints for the spatial perception tools being used and place them in the `./SpatialAgent/checkpoints/` directory, which should have a structure similar to the following:
|
| 90 |
-
|
| 91 |
-
```
|
| 92 |
-
./SpatialAgent/checkpoints
|
| 93 |
-
βββ dinov2-large
|
| 94 |
-
βββ Orient-Anything
|
| 95 |
-
β βββ base100p
|
| 96 |
-
β βββ base100p2
|
| 97 |
-
β βββ base25p
|
| 98 |
-
β βββ base50p
|
| 99 |
-
β βββ base75p
|
| 100 |
-
β βββ base75p2
|
| 101 |
-
β βββ celarge
|
| 102 |
-
β βββ cropbaseEx03
|
| 103 |
-
β βββ croplargeEX03
|
| 104 |
-
β βββ croplargeEX2
|
| 105 |
-
β βββ cropsmallEx03
|
| 106 |
-
β βββ mixreallarge
|
| 107 |
-
β βββ ronormsigma1
|
| 108 |
-
βββ RAFT
|
| 109 |
-
|
| 110 |
-
./SpatialAgent/DepthAnythingV2
|
| 111 |
-
βββ ckpt
|
| 112 |
-
β βββ hypersim.pth
|
| 113 |
-
β βββ vkitti.pth
|
| 114 |
-
|
| 115 |
-
./SpatialAgent/DetAny3D
|
| 116 |
-
βββ GroundingDINO
|
| 117 |
-
β βββ weights
|
| 118 |
-
β βββ groundingdino_swinb_cogcoor.pth
|
| 119 |
-
βββ checkpoints/detany3d
|
| 120 |
-
β βββ detany3d_ckpts
|
| 121 |
-
β βββ dino_ckpts
|
| 122 |
-
β βββ sam_ckpts
|
| 123 |
-
β βββ unidepth_ckpts
|
| 124 |
-
βββ models--bert-base-uncased
|
| 125 |
-
```
|
| 126 |
-
|
| 127 |
-
Furthermore, for [DetAny3D](https://github.com/OpenDriveLab/DetAny3D) and [DepthAnythingV2](https://github.com/DepthAnything/Depth-Anything-V2), you will also need to refer to their respective repositories, download the required checkpoints, and place them in their corresponding directories.
|
| 128 |
-
|
| 129 |
-
Our SpatialAgent supports two reasoning paradigms: Plan-Execute and ReAct. You can perform inference using the following script:
|
| 130 |
-
|
| 131 |
-
```
|
| 132 |
-
# Plan-Execute paradigm
|
| 133 |
-
CUDA_VISIBLE_DEVICES=0 python inference_plan-execute.py --start 0 --end 1000 --prompt_format cota --model_path ../huggingface/Qwen3-VL-4B-Instruct --model_name qwen3vl-4b
|
| 134 |
-
|
| 135 |
-
# ReAct paradigm
|
| 136 |
-
CUDA_VISIBLE_DEVICES=0 python inference_ReAct.py --start 0 --end 1000 --execute --prompt_format cota --model_path ../huggingface/Qwen3-VL-4B-Instruct --model_name qwen3vl-4b
|
| 137 |
-
```
|
| 138 |
-
|
| 139 |
## Citation
|
| 140 |
-
If you use this code and data for your research or project, please cite:
|
| 141 |
|
| 142 |
@inproceedings{wu2026spatialscore,
|
| 143 |
author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
|
|
|
|
| 45 |
conda activate SpatialScore
|
| 46 |
```
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## Citation
|
| 49 |
+
If you use this code, model, and data for your research or project, please cite:
|
| 50 |
|
| 51 |
@inproceedings{wu2026spatialscore,
|
| 52 |
author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
|