PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research Paper • 2604.15411 • Published 11 days ago • 4
Sleeping Agents 1 Molmo2-SGCoT: Visual Entity Tracking Demo 🎯 1 Track objects in shell games with SGCoT
Sleeping Agents 1 Molmo2-SGCoT: Visual Entity Tracking Demo 🎯 1 Track objects in shell games with SGCoT
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks Paper • 2305.14201 • Published May 23, 2023 • 6