Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train".