CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
Paper • 2407.10499 • Published
None defined yet.
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM