Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook Paper • 2602.14299 • Published 3 days ago • 23
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis Paper • 2602.12395 • Published 6 days ago • 14
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published Dec 23, 2025 • 16
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction Paper • 2512.18880 • Published Dec 21, 2025 • 25
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions Paper • 2512.11995 • Published Dec 12, 2025 • 10
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions Paper • 2512.11995 • Published Dec 12, 2025 • 10
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs Paper • 2511.07419 • Published Nov 10, 2025 • 27
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory Paper • 2509.14662 • Published Sep 18, 2025 • 13
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26, 2025 • 28
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published Jun 26, 2025 • 28
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 188
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors Paper • 2505.23001 • Published May 29, 2025 • 8
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Paper • 2505.22453 • Published May 28, 2025 • 46
Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Paper • 2505.22334 • Published May 28, 2025 • 36
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Paper • 2504.11544 • Published Apr 15, 2025 • 44
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use Paper • 2310.03128 • Published Oct 4, 2023 • 1
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10, 2025 • 48
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published Apr 10, 2025 • 48