OProver: A Unified Framework for Agentic Formal Theorem Proving Paper • 2605.17283 • Published 3 days ago • 28
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Paper • 2605.15301 • Published 6 days ago • 18
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 13 days ago • 45
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios Paper • 2604.25914 • Published 22 days ago • 41
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models Paper • 2604.18224 • Published 30 days ago • 22
DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published Apr 16 • 36
CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction Paper • 2603.00610 • Published Feb 28 • 35
AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations Paper • 2602.03828 • Published Feb 3 • 20
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
YuE: Scaling Open Foundation Models for Long-Form Music Generation Paper • 2503.08638 • Published Mar 11, 2025 • 73
T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation Paper • 2512.21094 • Published Dec 24, 2025 • 25