Upload ARFBench_leaderboard.csv
Browse files
results/ARFBench_leaderboard.csv
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Model,pass@1,pass@5,Presence,Identification,Start Time,End Time,Magnitude,Categorization,Correlation,Indicator
|
| 2 |
+
Random Choice,23.5,-,50.0,12.5,18.2,16.7,12.5,16.7,20.0,20.0
|
| 3 |
+
Frequent Choice,46.9,-,82.9,36.8,21.4,31.3,26.3,30.8,82.9,31.7
|
| 4 |
+
Oracle GPT-4o,57.5,-,87.4,34.2,26.8,25.0,55.3,62.5,82.9,28.6
|
| 5 |
+
GPT-4.1,57.6,62.2,82.9,39.5,39.2,37.5,60.5,54.8,72.9,34.9
|
| 6 |
+
Claude 3.7 Sonnet,56.7,57.4,85.6,34.2,41.0,40.6,53.9,54.8,67.1,36.5
|
| 7 |
+
GPT-4o,54.4,60.2,82.0,28.9,23.2,34.4,52.6,52.9,80.0,34.9
|
| 8 |
+
o4-mini,48.5,64.5,80.2,13.2,33.9,43.8,48.7,47.1,57.1,22.2
|
| 9 |
+
InternVL3-78B,43.5,47.2,84.7,31.6,32.1,31.3,30.3,47.1,24.3,25.4
|
| 10 |
+
Qwen2.5-VL-72B,41.1,53.8,83.8,21.1,25.0,6.3,32.9,26.0,48.6,36.5
|
| 11 |
+
LlaVa-OneVision-72B,38.9,44.6,76.6,26.3,23.2,15.6,21.1,42.3,42.9,17.5
|
| 12 |
+
Llama-3.2-Vision-90B,34.9,37.8,76.6,13.2,26.8,21.8,28.9,21.2,35.7,17.5
|
| 13 |
+
QvQ,19.8,22.9,11.7,15.8,1.79,0.00,27.6,36.5,28.6,15.9
|
| 14 |
+
ChatTS,10.0,10.4,3.60,10.5,14.3,3.13,9.21,14.4,18.6,4.76
|