# Zero-Shot ClipBench Evaluation
Please download the supported datasets directly from the datasets host and update paths in clip_benchmark/datasets/builder.py. And run
```bash
model='PE-Core-G14-448'
DATASETS=./clip_benchmark/tasks/wds_benchmarks.txt
DATA_ROOT=DATA_ROOT/

python -m clip_benchmark.cli eval \
    --model $model \
    --pretrained $CHECKPOINT \
    --dataset "$DATASETS" \
    --dataset_root $DATA_ROOT \
    --output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \
    --force-preprocess-cfg resize_mode=squash

```
This script will perform zero-shot classification abd retireval benchmarks defined in clip_benchmark/tasks/wds_benchmarks.txt. Examples above includes the following tasks:
- ImageNet 1K classification
- ImageNet v2 classification
- ImageNet Adversial classification
- MS-COCO retrieval
- Flickr30K retrieval
- Kinetics 400 video classification
- MSR-VTT video retrieval


# Zero-Shot Retrieval for PE-AudioVisual

```bash
python -m clip_benchmark.cli eval \
    --model pe-av-large \
    --reweight-scale 10 \
    --dataset audiocaps-audio-video audiocaps-audio-text audiocaps-video-text clotho-v2 \
    --dataset_root $DATASETS \
    --output "./benchmark_{pretrained}_{dataset}_{num_frames}_{model}_{language}_{task}.json" \
    --batch_size 4 --no_amp
```

This will run zero-shot retrieval for the following tasks:
- Audiocaps Audio-Video
- Audiocaps Audio-Text
- Audiocaps Video-Text
- Clotho-V2 Audio-Text

Clotho-V2 will be downloaded from its original source and unpacked, but due to Audiocaps being a Youtube dataset, the user will need to provide the audio and video paths under `$DATASETS/audiocaps/audio` and `$DATASETS/audiocaps/video` respectively.