LibContinual / docs /tutorials /en /data_module_en.md
boringKey's picture
Upload 236 files
5fee096 verified

Data Module

Related codes:

core/data/augments.py
core/data/dataloader.py
core/data/dataset.py

Dataset file format

In LibContinual, the dataset used has a fixed format. We read the data according to the dataset format set by most continual learning settings, such as CIFAR-10 and CIFAR-100. So we only need to download the dataset from the network and decompress it to use. If you want to use a new dataset and its data format is different from the above datasets, you need to convert it to the same format yourself.

Like CIFAR-10, the file format of the dataset should be the same as the following example:

dataset_folder/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ class_1/
β”‚      β”œβ”€β”€ image_1.png
β”‚      β”œβ”€β”€ ...
β”‚      └── image_5000.png
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ class_10/
β”‚      β”œβ”€β”€ image_1.png
β”‚      β”œβ”€β”€ ...
β”‚      └── image_5000.png
β”œβ”€β”€ test/
β”‚   β”œβ”€β”€ class_1/
β”‚      β”œβ”€β”€ image_1.png
β”‚      β”œβ”€β”€ ...
β”‚      └── image_5000.png
β”‚   β”œβ”€β”€ ...
β”‚   β”œβ”€β”€ class_10/
β”‚      β”œβ”€β”€ image_1.png
β”‚      β”œβ”€β”€ ...
β”‚      └── image_5000.png

The training images and test images need to be placed in the train and test folders respectively, where all images of the same category are placed in folde with the same name as the category, such as cat , dog, etc.

Configure Datasets

After downloading or organizing the dataset according to the above file format, simply modify the data_root field in the configuration file. Note that LibeContinual will print the dataset folder name as the dataset name on the log.