Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions Paper • 2606.09076 • Published 3 days ago • 44
Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application Paper • 2606.12191 • Published 1 day ago • 53
Redesign Mixture-of-Experts Routers with Manifold Power Iteration Paper • 2606.12397 • Published 1 day ago • 62
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning Paper • 2606.03108 • Published 9 days ago • 7
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization Paper • 2606.12373 • Published 1 day ago • 5
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models Paper • 2606.11324 • Published 2 days ago • 5
Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling Paper • 2606.12370 • Published 1 day ago • 11
ICA Lens: Interpreting Language Models Without Training Another Dictionary Paper • 2606.11722 • Published 1 day ago • 14
Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning Paper • 2606.11683 • Published 1 day ago • 25
DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch Paper • 2606.10728 • Published 2 days ago • 26
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 3 days ago • 34
SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning Paper • 2602.02472 • Published Feb 2 • 47
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published Dec 29, 2025 • 99