Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 5 days ago • 12
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published Apr 7 • 121
GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation Paper • 2512.17495 • Published Dec 19, 2025 • 20