Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published 3 days ago • 45
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models Paper • 2601.21639 • Published 9 days ago • 49
MSRL Collection Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation • 2 items • Updated Aug 26, 2025