Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning Paper • 2510.01833 • Published Oct 2, 2025
QCBench: Evaluating Large Language Models on Domain-Specific Quantitative Chemistry Paper • 2508.01670 • Published Aug 3, 2025
$δ$-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 3 days ago • 102
InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery Paper • 2602.08990 • Published Feb 9 • 77
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks Paper • 2602.06663 • Published Feb 6 • 5
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence Paper • 2512.22334 • Published Dec 26, 2025 • 36
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics Paper • 2508.18124 • Published Aug 25, 2025 • 49
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery Paper • 2508.08401 • Published Aug 11, 2025 • 42
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published Nov 27, 2024 • 40
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention Paper • 2506.23542 • Published Jun 30, 2025 • 13