Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization Paper β’ 2601.21358 β’ Published 8 days ago β’ 7
view article Article Activation Steering: A New Frontier in AI ControlβBut Does It Scale? Feb 2, 2025 β’ 4
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 β’ 275
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper β’ 2505.00551 β’ Published May 1, 2025 β’ 36
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper β’ 2504.19394 β’ Published Apr 27, 2025 β’ 13
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper β’ 2504.21659 β’ Published Apr 30, 2025 β’ 14
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper β’ 2504.20752 β’ Published Apr 29, 2025 β’ 94