In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 4 days ago • 20
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published 5 days ago • 13
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Paper • 2603.04918 • Published 8 days ago • 54
BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? Paper • 2603.03194 • Published 10 days ago • 53
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 14 days ago • 83
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published 30 days ago • 216
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Paper • 2602.13367 • Published 28 days ago • 31
SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training Paper • 2602.03411 • Published Feb 3 • 37
SWE-World: Building Software Engineering Agents in Docker-Free Environments Paper • 2602.03419 • Published Feb 3 • 40
RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Paper • 2602.02486 • Published Feb 2 • 19
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60
RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System Paper • 2602.02488 • Published Feb 2 • 33
DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents Paper • 2601.20975 • Published Jan 28 • 10
Scaling Embeddings Outperforms Scaling Experts in Language Models Paper • 2601.21204 • Published Jan 29 • 102
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120