-
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam
Paper • 2602.13964 • Published • 11 -
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Paper • 2603.03823 • Published • 7 -
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Paper • 2602.16742 • Published • 12 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 109
AI & ML interests
None defined yet.
Recent Activity
Papers
View all Papers-
HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam
Paper • 2602.13964 • Published • 11 -
SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
Paper • 2603.03823 • Published • 7 -
DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning
Paper • 2602.16742 • Published • 12 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 109
datasets 6
skylenage-ai/SWE-CI-trajectory
Updated
skylenage-ai/RubricRM-Data
Viewer • Updated • 32.5k • 47
skylenage-ai/SWE-CI
Updated • 2.19k • 14
skylenage-ai/QwenClawBench
Viewer • Updated • 100 • 235 • 10
skylenage-ai/HLE-Verified
Viewer • Updated • 2.5k • 17.2k • 17
skylenage-ai/DeepVision-103K
Viewer • Updated • 104k • 1.65k • 35