view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
Qwen/Qwen3-Coder-30B-A3B-Instruct Text Generation • 31B • Updated Dec 3, 2025 • 652k • • 943
cerebras/GLM-4.5-Air-REAP-82B-A12B Text Generation • 82B • Updated Oct 21, 2025 • 5.62k • 108
Artificial Hippocampus Networks for Efficient Long-Context Modeling Paper • 2510.07318 • Published Oct 8, 2025 • 31
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution Paper • 2505.20286 • Published May 26, 2025 • 8 • 4
Running on CPU Upgrade 584 GAIA Leaderboard 🦾 584 Submit your model's answers and view the GAIA benchmark leaderboard