Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 12 days ago • 189
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 4 days ago • 138
Capturing LLM Capabilities via Evidence-Calibrated Query Clustering Paper • 2605.17110 • Published 8 days ago • 2
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 10 days ago • 142
Does Synthetic Layered Design Data Benefit Layered Design Decomposition? Paper • 2605.15167 • Published 10 days ago • 8
BalCapRL: A Balanced Framework for RL-Based MLLM Image Captioning Paper • 2605.07394 • Published 16 days ago • 4
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published 24 days ago • 57
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published Apr 7 • 43
Brevity Constraints Reverse Performance Hierarchies in Language Models Paper • 2604.00025 • Published Mar 11 • 23
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 342