feat: add evaluation datasets (HumanEval 50, MBPP 100, Tool scenarios 50) 20a06fb walidsobhie-code commited on 20 days ago