ClinHallu: A Benchmark for Diagnosing Stage-Wise Hallucinations in Medical MLLM Reasoning Paper • 2606.14697 • Published 17 days ago • 8
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Paper • 2606.11324 • Published 20 days ago • 170
LongAttnComp: Cross-Family Context Compression for Long-Context Reasoning Paper • 2606.01336 • Published 29 days ago • 8
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published May 25 • 36
HINT-SD: Targeted Hindsight Self-Distillation for Long-Horizon Agents Paper • 2605.17873 • Published May 18 • 12
SOD: Step-wise On-policy Distillation for Small Language Model Agents Paper • 2605.07725 • Published May 8 • 25
Visual Aesthetic Benchmark: Can Frontier Models Judge Beauty? Paper • 2605.12684 • Published May 12 • 11
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published May 13 • 33
DiPO: Disentangled Perplexity Policy Optimization for Fine-grained Exploration-Exploitation Trade-Off Paper • 2604.13902 • Published Apr 15 • 62