A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization Paper • 2606.16154 • Published 19 days ago • 8