Running Agents 2 LLM Evaluation Framework Demo ๐ 2 Benchmark LLMs on accuracy, cost, and hallucination.