When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier Models Paper • 2606.27288 • Published 4 days ago • 3 • 2
Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix It Paper • 2606.26027 • Published 5 days ago • 16 • 2
The Verification Horizon: No Silver Bullet for Coding Agent Rewards Paper • 2606.26300 • Published 5 days ago • 41 • 3
Constraint Tax in Open-Weight LLMs: An Empirical Study of Tool Calling Suppression Under Structured Output Constraints Paper • 2606.25605 • Published 5 days ago • 3
Escaping the Self-Confirmation Trap: An Execute-Distill-Verify Paradigm for Agentic Experience Learning Paper • 2606.24428 • Published 6 days ago • 51 • 2
Deeper is Not Always Better: Mitigating the Alignment Tax via Confident Layer Decoding Paper • 2606.21906 • Published 9 days ago • 24 • 13
Sleeping Agents FEST-Style Few-Shot RL for Reasoning 🧠 Solve math problems with step‑by‑step reasoning
Sleeping Agents FEST-Style Few-Shot RL for Reasoning 🧠 Solve math problems with step‑by‑step reasoning
Sleeping Agents Implicit Memory Conflict Validator 🧠 Evaluate LLM responses for outdated memory conflicts
Sleeping Agents Implicit Memory Conflict Validator 🧠 Evaluate LLM responses for outdated memory conflicts