CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era Paper • 2602.23452 • Published 6 days ago • 16
Enhancing Spatial Understanding in Image Generation via Reward Modeling Paper • 2602.24233 • Published 6 days ago • 47
Classroom Final Exam: An Instructor-Tested Reasoning Benchmark Paper • 2602.19517 • Published 10 days ago • 4
Reasoning Core: A Scalable Procedural Data Generation Suite for Symbolic Pre-training and Post-Training Paper • 2603.02208 • Published 2 days ago • 4