Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published 5 days ago • 41
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published 11 days ago • 63
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents Paper • 2603.18815 • Published 12 days ago • 14
OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding Paper • 2601.09575 • Published Jan 14 • 26
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 54
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 229
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Paper • 2512.23162 • Published Dec 29, 2025 • 14
Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting Paper • 2512.20927 • Published Dec 24, 2025 • 17
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published Dec 23, 2025 • 42
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in Paper • 2512.14273 • Published Dec 16, 2025 • 10
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published Dec 2, 2025 • 264