fix: Add Visualization column to main table (not just benchmark tables)
5f628a6
openhandsopenhandscommited on
feat: Add Visualization column for Laminar eval links
dfa8bfc
openhandsopenhandscommited on
Skip entries with hide_from_leaderboard=True
2123552
openhandsopenhandscommited on
Remove agenteval.json, use hardcoded mappings as single source of truth
24ccc42
openhandscommited on
Add Evolution Over Time and Open Model Accuracy by Size visualizations
a4b9436
openhandsopenhandscommited on
Add runtime column and Cost/Performance + Runtime/Performance charts to all pages
2854ddd
openhandsopenhandscommited on
Move Download column to benchmark-specific tables only
4d0ae13
openhandsopenhandscommited on
Add Download column for trajectory archives and increase table font size
b5317d7
openhandsopenhandscommited on
Fix Pareto frontier calculation and display
1739efc
openhandsopenhandscommited on
Remove multi-swe-bench from OpenHands Index
b978a6b
openhandscommited on
Update DeepSeek logo, tooltip format, and category names
5778893
openhandsopenhandscommited on
Replace total_cost with cost_per_instance (average cost per instance)
b1f3e49
openhandsopenhandscommited on
Fix TypeError when summing costs with None values
cdd40ba
openhandsopenhandscommited on
feat: Update leaderboard calculations and add incomplete entries toggle
5998027
openhandsopenhandscommited on
feat: Use pydantic schema models from openhands-index-results for validation
f0339f3
openhandsopenhandscommited on
Update fallback category mappings: place SWE-Bench Multimodal under 'Frontend Development' and Swt-Bench under 'Test Generation'.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
b42a4fe
openhandscommited on
Move commit0 to 'App Creation' category in fallback mappings.\n\nCo-authored-by: openhands <openhands@all-hands.dev>
b16f7da
openhandscommited on
CRITICAL FIX: Add fallback category mappings for data without agenteval.json
b4ac443
openhandsopenhandscommited on
Add debug logging to track data loading on HuggingFace Space
044cdf4
openhandsopenhandscommited on
Fix score calculation to match AstaBench methodology and update categories
e734bf6
openhandscommited on
Fix agent_version display and make Overall Score bold