OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models Paper • 2402.06044 • Published Feb 8, 2024 • 2
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives Paper • 2402.11051 • Published Feb 16, 2024 • 2
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13, 2025 • 53
Pull Requests as a Training Signal for Repo-Level Code Editing Paper • 2602.07457 • Published Feb 7 • 2
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs Paper • 2605.30501 • Published 7 days ago • 26
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published Oct 13, 2025 • 53
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives Paper • 2402.11051 • Published Feb 16, 2024 • 2
OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models Paper • 2402.06044 • Published Feb 8, 2024 • 2
Pull Requests as a Training Signal for Repo-Level Code Editing Paper • 2602.07457 • Published Feb 7 • 2
Linear Ensembles Wash Away Watermarks: On the Fragility of Distributional Perturbations in LLMs Paper • 2605.30501 • Published 7 days ago • 26