AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation Paper • 2604.18240 • Published 4 days ago • 14
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 22 days ago • 96
V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published Mar 11 • 14
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning Paper • 2601.21468 • Published Jan 29 • 25
Examining False Positives under Inference Scaling for Mathematical Reasoning Paper • 2502.06217 • Published Feb 10, 2025 • 1