haoningwu commited on
Commit
385aa4d
Β·
verified Β·
1 Parent(s): 7b9ef39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -92
README.md CHANGED
@@ -45,99 +45,8 @@ conda env create -f environment.yaml
45
  conda activate SpatialScore
46
  ```
47
 
48
- ## Dataset
49
- Please check out [SpaitalScore](https://huggingface.co/datasets/haoningwu/SpatialScore) to download our proposed benchmark (`SpatialScore`).
50
-
51
- If you cannot access Huggingface, you can use [hf-mirror](https://hf-mirror.com/) to download models.
52
-
53
- ```
54
- export HF_ENDPOINT=https://hf-mirror.com # Add this before huggingface-cli download
55
- ```
56
-
57
- You can follow the commands below to prepare the data:
58
-
59
- ```
60
- huggingface-cli download --resume-download --repo-type dataset haoningwu/SpatialScore --local-dir ./ --local-dir-use-symlinks False
61
- unzip SpatialScore_benchmark.zip
62
- ```
63
-
64
- ## Evaluation
65
- Considering the current mainstream model architectures, we have prioritized support for the Qwen2.5-VL and Qwen3-VL series models.
66
- You can evaluate them on SpatialScore using the following commands:
67
-
68
- ```
69
- CUDA_VISIBLE_DEVICES=0,1 python test_qwen.py --model_name qwen3vl-4b --model_path ./huggingface/Qwen3-VL-4B-Instruct --dataset_json_path ./SpatialScore_benchmark/SpatialScore_benchmark.ndjson --output_dir ./eval_results
70
- ```
71
-
72
- Now, the All-in-one script supporting all other models is also available.
73
- You can evaluate other models on SpatialScore using the following commands:
74
-
75
- ```
76
- CUDA_VISIBLE_DEVICES=0,1 python test_all_in_one.py --model_name llava-ov-7b --model_path ../huggingface/LLaVA-OneVision-7B --dataset_json_path ./SpatialScore_benchmark/SpatialScore_benchmark.ndjson --output_dir ./eval_results
77
- ```
78
-
79
- Our final evaluation encompassed rule-based evaluation and LLM-based answer extraction, which are combined to calculate the final accuracy.
80
- Therefore, you need to configure [GPT-OSS](https://github.com/openai/gpt-oss) and download the corresponding [GPT-OSS-20B](https://huggingface.co/openai/gpt-oss-20b) checkpoint before running the following script to compute the final score:
81
-
82
- ```
83
- MKL_THREADING_LAYER=GNU CUDA_VISIBLE_DEVICES=0 python ./evaluate_results.py --input ./eval_results/qwen3vl-4b
84
- ```
85
-
86
- ## Inference with SpatialAgent
87
- Before using SpatialAgent, you need to install the additional dependencies required by the toolbox according to the Requirements section.
88
-
89
- In addition, you should download the checkpoints for the spatial perception tools being used and place them in the `./SpatialAgent/checkpoints/` directory, which should have a structure similar to the following:
90
-
91
- ```
92
- ./SpatialAgent/checkpoints
93
- β”œβ”€β”€ dinov2-large
94
- β”œβ”€β”€ Orient-Anything
95
- β”‚ β”œβ”€β”€ base100p
96
- β”‚ β”œβ”€β”€ base100p2
97
- β”‚ β”œβ”€β”€ base25p
98
- β”‚ β”œβ”€β”€ base50p
99
- β”‚ β”œβ”€β”€ base75p
100
- β”‚ β”œβ”€β”€ base75p2
101
- β”‚ β”œβ”€β”€ celarge
102
- β”‚ β”œβ”€β”€ cropbaseEx03
103
- β”‚ β”œβ”€β”€ croplargeEX03
104
- β”‚ β”œβ”€β”€ croplargeEX2
105
- β”‚ β”œβ”€β”€ cropsmallEx03
106
- β”‚ β”œβ”€β”€ mixreallarge
107
- β”‚ └── ronormsigma1
108
- └── RAFT
109
-
110
- ./SpatialAgent/DepthAnythingV2
111
- └── ckpt
112
- β”‚ β”œβ”€β”€ hypersim.pth
113
- β”‚ └── vkitti.pth
114
-
115
- ./SpatialAgent/DetAny3D
116
- β”œβ”€β”€ GroundingDINO
117
- β”‚ └── weights
118
- β”‚ └── groundingdino_swinb_cogcoor.pth
119
- β”œβ”€β”€ checkpoints/detany3d
120
- β”‚ β”œβ”€β”€ detany3d_ckpts
121
- β”‚ β”œβ”€β”€ dino_ckpts
122
- β”‚ β”œβ”€β”€ sam_ckpts
123
- β”‚ └── unidepth_ckpts
124
- └── models--bert-base-uncased
125
- ```
126
-
127
- Furthermore, for [DetAny3D](https://github.com/OpenDriveLab/DetAny3D) and [DepthAnythingV2](https://github.com/DepthAnything/Depth-Anything-V2), you will also need to refer to their respective repositories, download the required checkpoints, and place them in their corresponding directories.
128
-
129
- Our SpatialAgent supports two reasoning paradigms: Plan-Execute and ReAct. You can perform inference using the following script:
130
-
131
- ```
132
- # Plan-Execute paradigm
133
- CUDA_VISIBLE_DEVICES=0 python inference_plan-execute.py --start 0 --end 1000 --prompt_format cota --model_path ../huggingface/Qwen3-VL-4B-Instruct --model_name qwen3vl-4b
134
-
135
- # ReAct paradigm
136
- CUDA_VISIBLE_DEVICES=0 python inference_ReAct.py --start 0 --end 1000 --execute --prompt_format cota --model_path ../huggingface/Qwen3-VL-4B-Instruct --model_name qwen3vl-4b
137
- ```
138
-
139
  ## Citation
140
- If you use this code and data for your research or project, please cite:
141
 
142
  @inproceedings{wu2026spatialscore,
143
  author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},
 
45
  conda activate SpatialScore
46
  ```
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ## Citation
49
+ If you use this code, model, and data for your research or project, please cite:
50
 
51
  @inproceedings{wu2026spatialscore,
52
  author = {Wu, Haoning and Huang, Xiao and Chen, Yaohui and Zhang, Ya and Wang, Yanfeng and Xie, Weidi},