I found this article on HF and saved it to read.
-
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
Paper ⢠2605.02290 ⢠Published ⢠37 -
Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex
Paper ⢠2605.06139 ⢠Published ⢠65 -
ZAYA1-8B Technical Report
Paper ⢠2605.05365 ⢠Published ⢠4 -
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Paper ⢠2511.06221 ⢠Published ⢠134