Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 7 days ago • 152
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety Paper • 2604.12710 • Published Apr 13 • 5
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety Paper • 2604.12710 • Published Apr 13 • 5
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety Paper • 2604.12710 • Published Apr 13 • 5
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 98
HaluMem: Evaluating Hallucinations in Memory Systems of Agents Paper • 2511.03506 • Published Nov 5, 2025 • 95
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 47
DeepAgent: A General Reasoning Agent with Scalable Toolsets Paper • 2510.21618 • Published Oct 24, 2025 • 103
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 37
view article Article DABStep: Data Agent Benchmark for Multi-step Reasoning +5 eggie5, martinigoyanes, frisokingma, andreumora, lvwerra, thomwolf, m-ric • Feb 4, 2025 • 130
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20, 2025 • 69