DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 4 days ago • 173
Swift Sampling: Selecting Temporal Surprises via Taylor Series Paper • 2605.22678 • Published 3 days ago • 6
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 12 days ago • 189
Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models Paper • 2604.16902 • Published Apr 18 • 6
madhusudhan001/qwen2.5-0.5b-materials-science Text Generation • 0.5B • Updated 23 days ago • 283 • • 1
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published Apr 8 • 325
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 291
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 629