Zero-Shot ClipBench Evaluation
Please download the supported datasets directly from the datasets host and update paths in clip_benchmark/datasets/builder.py. And run
model='PE-Core-G14-448'
DATASETS=./clip_benchmark/tasks/wds_benchmarks.txt
DATA_ROOT=DATA_ROOT/
python -m clip_benchmark.cli eval \
--model $model \
--pretrained $CHECKPOINT \
--dataset "$DATASETS" \
--dataset_root $DATA_ROOT \
--output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \
--force-preprocess-cfg resize_mode=squash
This script will perform zero-shot classification abd retireval benchmarks defined in clip_benchmark/tasks/wds_benchmarks.txt. Examples above includes the following tasks:
- ImageNet 1K classification
- ImageNet v2 classification
- ImageNet Adversial classification
- MS-COCO retrieval
- Flickr30K retrieval
- Kinetics 400 video classification
- MSR-VTT video retrieval
Zero-Shot Retrieval for PE-AudioVisual
python -m clip_benchmark.cli eval \
--model pe-av-large \
--reweight-scale 10 \
--dataset audiocaps-audio-video audiocaps-audio-text audiocaps-video-text clotho-v2 \
--dataset_root $DATASETS \
--output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \
--batch_size 4 --no_amp
This will run zero-shot retrieval for the following tasks:
- Audiocaps Audio-Video
- Audiocaps Audio-Text
- Audiocaps Video-Text
- Clotho-V2 Audio-Text
Clotho-V2 will be downloaded from its original source and unpacked, but due to Audiocaps being a Youtube dataset, the user will need to provide the audio and video paths under $DATASETS/audiocaps/audio and $DATASETS/audiocaps/video respectively.