MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills? Paper • 2606.01993 • Published 4 days ago • 13
Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth Paper • 2605.25052 • Published 13 days ago • 14
DCAgent3/dev_set_v2_rl__24GPU_base_excl_timeouts__exp_rpt_pymethods2test_large__GLM_4_7_c2148a8d Viewer • Updated 9 days ago • 296 • 62 • 1
SaaSBench: Exploring the Boundaries of Coding Agents in Long-Horizon Enterprise SaaS Engineering Paper • 2605.17526 • Published 20 days ago • 7
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 30 days ago • 233
Forge-UGC: FX optimization and register-graph engine for universal graph compiler Paper • 2604.16498 • Published Apr 14 • 5
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 327