Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? Paper • 2602.07055 • Published 24 days ago • 22
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 34
mlx-community/XortronCriminalComputingConfig-mlx-8Bit Text Generation • Updated Jun 19, 2025 • 35 • 3