reasoning_model
updated
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
• 2511.16334
• Published
• 93
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
Paper
• 2509.07980
• Published
• 104
ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM
Test-time Compute
Paper
• 2509.04475
• Published
• 3
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
• 2512.01374
• Published
• 105
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
Paper
• 2511.22570
• Published
• 90
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Paper
• 2512.07843
• Published
• 22
Paper
• 2510.01141
• Published
• 121
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published
• 30
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
OpenR: An Open Source Framework for Advanced Reasoning with Large
Language Models
Paper
• 2410.09671
• Published
• 1
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper
• 2512.16676
• Published
• 219
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience
Paper
• 2512.17260
• Published
• 52
Latent Implicit Visual Reasoning
Paper
• 2512.21218
• Published
• 69
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
Paper
• 2512.20605
• Published
• 62
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
Paper
• 2512.19995
• Published
• 16
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
• 2511.13612
• Published
• 134
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper
• 2511.08567
• Published
• 34
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model
Reasoning Ability in VibeThinker-1.5B
Paper
• 2511.06221
• Published
• 132
SafeGRPO: Self-Rewarded Multimodal Safety Alignment via Rule-Governed Policy Optimization
Paper
• 2511.12982
• Published
• 4
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics
Olympiad Benchmark?
Paper
• 2509.07894
• Published
• 31
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper
• 2512.24617
• Published
• 65
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper
• 2601.02346
• Published
• 26
Lost in the Noise: How Reasoning Models Fail with Contextual Distractors
Paper
• 2601.07226
• Published
• 33
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper
• 2601.09088
• Published
• 63
SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning
Paper
• 2601.04809
• Published
• 3
Paper
• 2412.16720
• Published
• 37
Learning Adaptive Parallel Reasoning with Language Models
Paper
• 2504.15466
• Published
• 44
Training Large Language Models to Reason in a Continuous Latent Space
Paper
• 2412.06769
• Published
• 94
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
Paper
• 2601.20614
• Published
• 118