# Zero-Shot ClipBench Evaluation Please download the supported datasets directly from the datasets host and update paths in clip_benchmark/datasets/builder.py. And run ```bash model='PE-Core-G14-448' DATASETS=./clip_benchmark/tasks/wds_benchmarks.txt DATA_ROOT=DATA_ROOT/ python -m clip_benchmark.cli eval \ --model $model \ --pretrained $CHECKPOINT \ --dataset "$DATASETS" \ --dataset_root $DATA_ROOT \ --output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \ --force-preprocess-cfg resize_mode=squash ``` This script will perform zero-shot classification abd retireval benchmarks defined in clip_benchmark/tasks/wds_benchmarks.txt. Examples above includes the following tasks: - ImageNet 1K classification - ImageNet v2 classification - ImageNet Adversial classification - MS-COCO retrieval - Flickr30K retrieval - Kinetics 400 video classification - MSR-VTT video retrieval # Zero-Shot Retrieval for PE-AudioVisual ```bash python -m clip_benchmark.cli eval \ --model pe-av-large \ --reweight-scale 10 \ --dataset audiocaps-audio-video audiocaps-audio-text audiocaps-video-text clotho-v2 \ --dataset_root $DATASETS \ --output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \ --batch_size 4 --no_amp ``` This will run zero-shot retrieval for the following tasks: - Audiocaps Audio-Video - Audiocaps Audio-Text - Audiocaps Video-Text - Clotho-V2 Audio-Text Clotho-V2 will be downloaded from its original source and unpacked, but due to Audiocaps being a Youtube dataset, the user will need to provide the audio and video paths under `$DATASETS/audiocaps/audio` and `$DATASETS/audiocaps/video` respectively.