Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 4 days ago • 41
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning Paper • 2606.13106 • Published 2 days ago • 15
Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It Paper • 2606.11052 • Published 4 days ago • 14
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short Paper • 2606.09380 • Published 4 days ago • 8
Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation Paper • 2606.06428 • Published 9 days ago • 25
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs Paper • 2605.30501 • Published 16 days ago • 29
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 30 days ago • 145
OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 25 days ago • 40
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions Paper • 2605.20087 • Published 25 days ago • 18
EnvFactory Collection This is the checkpoints and dataset for: EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL. • 7 items • Updated 24 days ago • 1
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL Paper • 2605.18703 • Published 26 days ago • 50
Efficient RLVR Training via Weighted Mutual Information Data Selection Paper • 2603.01907 • Published Mar 2 • 14
ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall Paper • 2510.07896 • Published Oct 9, 2025 • 11
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 103
CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Paper • 2602.17684 • Published Feb 4 • 22
Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning Paper • 2602.01745 • Published Feb 2 • 7
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 44
LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth Paper • 2602.07962 • Published Feb 8 • 24
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration Paper • 2602.01734 • Published Feb 2 • 32