Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Paper • 2605.19461 • Published 4 days ago • 1
What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents Paper • 2605.19447 • Published 4 days ago • 2
What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents Paper • 2605.19447 • Published 4 days ago • 2
Beyond Mode Collapse: Distribution Matching for Diverse Reasoning Paper • 2605.19461 • Published 4 days ago • 1
Learning from Language Feedback via Variational Policy Distillation Paper • 2605.15113 • Published 5 days ago • 9
TL-GRPO: Turn-Level RL for Reasoning-Guided Iterative Optimization Paper • 2601.16480 • Published Jan 23 • 50
Running on CPU Upgrade Agents 28 RISEBench Gallery 👀 28 A Gallery of Generation Results on RISEBench
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Paper • 2503.19990 • Published Mar 25, 2025 • 35