Release AI-ModelZoo-4.0.0
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ The authors proposed a method that uniformly scales all dimensions depth/width/r
|
|
| 15 |
Using neural architecture search, the authors created the EfficientNet topology and starting from B0, derived a few variants B1...B7 ordered by increasing complexity.
|
| 16 |
Its main building blocks are a mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan et al., 2019) and a squeeze-and-excitation optimization (Hu et al., 2018).
|
| 17 |
|
| 18 |
-
EfficientNet provides state-of-the art accuracy on
|
| 19 |
than its comparable (ResNet, DenseNet, Inception...).
|
| 20 |
However, for STM32 platforms, B0 is already too large. That's why, we internally derived a custom version tailored for STM32
|
| 21 |
and modified it to be quantization-friendly (not discussed in the initial paper). This custom model is then quantized in int8 using Tensorflow Lite converter.
|
|
@@ -23,7 +23,7 @@ In the following, the resulting model is called ST EfficientNet LC v1 (LC standi
|
|
| 23 |
|
| 24 |
ST EfficientNet LC v1 was obtained after fine-tuning of the original topology. Our goal was to reach around 500 kBytes for RAM and weights.
|
| 25 |
For achieving this, we decided to replace original 'swish' by a simple 'relu6', and search for good expansion factor, depth
|
| 26 |
-
and width coefficients. Of course, many models could meet the requirement. We selected the one which was better performing on
|
| 27 |
We made several attempts to quantize the EfficientNet topology, and discover some issues when quantizing activations.
|
| 28 |
The problem was fixed mainly by adding a clipping lambda layer before the sigmoid.
|
| 29 |
|
|
@@ -69,33 +69,34 @@ For an image resolution of NxM and P classes :
|
|
| 69 |
|
| 70 |
* `tfs` stands for "training from scratch", meaning that the model weights were randomly initialized before training.
|
| 71 |
|
| 72 |
-
### Reference **NPU** memory footprint on
|
| 73 |
-
|Model | Format | Resolution | Series | Internal RAM (KiB) | External RAM (KiB) | Weights Flash (KiB) |
|
| 74 |
-
|
| 75 |
-
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/
|
| 76 |
-
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/
|
| 77 |
-
|
| 78 |
-
### Reference **NPU** inference time on food-101 dataset (see Accuracy for details on dataset)
|
| 79 |
-
| Model | Format | Resolution | Board | Execution Engine | Inference time (ms) | Inf / sec | STM32Cube.AI version | STEdgeAI Core version |
|
| 80 |
-
|--------|--------|-------------|------------------|------------------|---------------------|-----------|----------------------|-------------------------|
|
| 81 |
-
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_128_tfs/st_efficientnet_lc_v1_128_tfs_int8.tflite)| Int8 | 128x128x3 | STM32N6570-DK | NPU/MCU | 6.88 | 145.34 | 10.2.0 | 2.2.0 |
|
| 82 |
-
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food-101/st_efficientnet_lc_v1_224_tfs/st_efficientnet_lc_v1_224_tfs_int8.tflite) | Int8 | 224x224x3 | STM32N6570-DK | NPU/MCU | 15.76 | 63.45 | 10.2.0 | 2.2.0 |
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
### Reference **MCU** memory footprints based on Flowers dataset (see Accuracy for details on dataset)
|
| 86 |
-
| Model | Format | Resolution | Series | Activation RAM | Runtime RAM | Weights Flash | Code Flash | Total RAM | Total Flash |
|
| 87 |
-
|
| 88 |
-
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H7 |
|
| 89 |
-
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H7 |
|
| 90 |
|
| 91 |
|
| 92 |
### Reference **MCU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
| 93 |
-
| Model | Format | Resolution | Board | Execution Engine | Frequency | Inference time (ms) |
|
| 94 |
-
|
| 95 |
-
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H747I-DISCO | 1 CPU | 400 MHz |
|
| 96 |
-
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H747I-DISCO | 1 CPU | 400 MHz |
|
| 97 |
-
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 871.7 ms |
|
| 98 |
-
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 259.5 ms |
|
| 99 |
|
| 100 |
|
| 101 |
### Reference **MPU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
|
@@ -140,10 +141,11 @@ Number of classes: 101, number of files: 101000
|
|
| 140 |
|
| 141 |
| Model | Format | Resolution | Top 1 Accuracy (%) |
|
| 142 |
|---------------------------|--------|------------|--------------------|
|
| 143 |
-
| ST EfficientNet LC v1 tfs | Float | 224x224x3 | 74.
|
| 144 |
-
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | 74.
|
| 145 |
-
| ST EfficientNet LC v1 tfs | Float | 128x128x3 |
|
| 146 |
-
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | 63.
|
|
|
|
| 147 |
|
| 148 |
|
| 149 |
## Retraining and Integration in a simple example:
|
|
|
|
| 15 |
Using neural architecture search, the authors created the EfficientNet topology and starting from B0, derived a few variants B1...B7 ordered by increasing complexity.
|
| 16 |
Its main building blocks are a mobile inverted bottleneck MBConv (Sandler et al., 2018; Tan et al., 2019) and a squeeze-and-excitation optimization (Hu et al., 2018).
|
| 17 |
|
| 18 |
+
EfficientNet provides state-of-the art accuracy on imagenet and CIFAR for example while being much smaller and faster
|
| 19 |
than its comparable (ResNet, DenseNet, Inception...).
|
| 20 |
However, for STM32 platforms, B0 is already too large. That's why, we internally derived a custom version tailored for STM32
|
| 21 |
and modified it to be quantization-friendly (not discussed in the initial paper). This custom model is then quantized in int8 using Tensorflow Lite converter.
|
|
|
|
| 23 |
|
| 24 |
ST EfficientNet LC v1 was obtained after fine-tuning of the original topology. Our goal was to reach around 500 kBytes for RAM and weights.
|
| 25 |
For achieving this, we decided to replace original 'swish' by a simple 'relu6', and search for good expansion factor, depth
|
| 26 |
+
and width coefficients. Of course, many models could meet the requirement. We selected the one which was better performing on food101 dataset.
|
| 27 |
We made several attempts to quantize the EfficientNet topology, and discover some issues when quantizing activations.
|
| 28 |
The problem was fixed mainly by adding a clipping lambda layer before the sigmoid.
|
| 29 |
|
|
|
|
| 69 |
|
| 70 |
* `tfs` stands for "training from scratch", meaning that the model weights were randomly initialized before training.
|
| 71 |
|
| 72 |
+
### Reference **NPU** memory footprint on food101 dataset (see Accuracy for details on dataset)
|
| 73 |
+
|Model | Format | Resolution | Series | Internal RAM (KiB) | External RAM (KiB) | Weights Flash (KiB) | STEdgeAI Core version |
|
| 74 |
+
|----------|--------|-------------|------------------|------------------|---------------------|----------------------|-------------------------|
|
| 75 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_128_tfs/st_efficientnetlcv1_128_tfs_qdq_int8.onnx) | Int8 | 128x128x3 | STM32N6 | 176 | 0 | 540.28 | 3.0.0 |
|
| 76 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_224_tfs/st_efficientnetlcv1_224_tfs_qdq_int8.onnx) | Int8 | 224x224x3 | STM32N6 | 588.02 | 0 | 550.39 | 3.0.0 |
|
| 77 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_224_tfs/st_efficientnetlcv1_224_tfs_qdq_w4_26.1%_w8_73.9%_a8_100%_acc_73.12.onnx) | Int8/Int4 | 224x224x3 | STM32N6 | 588.02 | 0 | 481.49 | 3.0.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
### Reference **NPU** inference time on food101 dataset (see Accuracy for details on dataset)
|
| 80 |
+
| Model | Format | Resolution | Board | Execution Engine | Inference time (ms) | Inf / sec | STEdgeAI Core version |
|
| 81 |
+
|--------|--------|-------------|------------------|------------------|---------------------|-----------|--------------------------|
|
| 82 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_128_tfs/st_efficientnetlcv1_128_tfs_qdq_int8.onnx)| Int8 | 128x128x3 | STM32N6570-DK | NPU/MCU | 7.12 | 140.45 | 3.0.0 |
|
| 83 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_224_tfs/st_efficientnetlcv1_224_tfs_qdq_int8.onnx) | Int8 | 224x224x3 | STM32N6570-DK | NPU/MCU | 17.31 | 57.77 | 3.0.0 |
|
| 84 |
+
| [ST EfficientNet LC v1 tfs](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/image_classification/efficientnet/ST_pretrainedmodel_public_dataset/food101/st_efficientnetlcv1_224_tfs/st_efficientnetlcv1_224_tfs_qdq_w4_26.1%_w8_73.9%_a8_100%_acc_73.12.onnx) | Int8/Int4 | 224x224x3 | STM32N6570-DK | NPU/MCU | 17.22 | 58.07 | 3.0.0 |
|
| 85 |
|
| 86 |
### Reference **MCU** memory footprints based on Flowers dataset (see Accuracy for details on dataset)
|
| 87 |
+
| Model | Format | Resolution | Series | Activation RAM | Runtime RAM | Weights Flash | Code Flash | Total RAM | Total Flash | STEdgeAI Core version |
|
| 88 |
+
|---------------------------|--------|--------------|---------|----------------|-------------|---------------|------------|------------|-------------|-----------------------|
|
| 89 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H7 | 466.01 KiB | 15.6 KiB | 505.29 KiB | 100.99 KiB | 481.61 KiB | 606.28 KiB | 3.0.0 |
|
| 90 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H7 | 181.01 KiB | 15.6 KiB | 505.29 KiB | 100.62 KiB | 196.61 KiB | 605.91 KiB | 3.0.0 |
|
| 91 |
|
| 92 |
|
| 93 |
### Reference **MCU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
| 94 |
+
| Model | Format | Resolution | Board | Execution Engine | Frequency | Inference time (ms) | STEdgeAI Core version |
|
| 95 |
+
|---------------------------|--------|------------|-------------------|------------------|-----------|---------------------|-----------------------|
|
| 96 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32H747I-DISCO | 1 CPU | 400 MHz | 459.99 ms | 3.0.0 |
|
| 97 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32H747I-DISCO | 1 CPU | 400 MHz | 155.22 ms | 3.0.0 |
|
| 98 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 871.7 ms | 3.0.0 |
|
| 99 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | STM32F769I-DISCO | 1 CPU | 216 MHz | 259.5 ms | 3.0.0 |
|
| 100 |
|
| 101 |
|
| 102 |
### Reference **MPU** inference time based on Flowers dataset (see Accuracy for details on dataset)
|
|
|
|
| 141 |
|
| 142 |
| Model | Format | Resolution | Top 1 Accuracy (%) |
|
| 143 |
|---------------------------|--------|------------|--------------------|
|
| 144 |
+
| ST EfficientNet LC v1 tfs | Float | 224x224x3 | 74.59 |
|
| 145 |
+
| ST EfficientNet LC v1 tfs | Int8 | 224x224x3 | 74.02 |
|
| 146 |
+
| ST EfficientNet LC v1 tfs | Float | 128x128x3 | 64.11 |
|
| 147 |
+
| ST EfficientNet LC v1 tfs | Int8 | 128x128x3 | 63.21 |
|
| 148 |
+
| ST EfficientNet LC v1 tfs | Int8/Int4 | 224x224x3 | 73.12 |
|
| 149 |
|
| 150 |
|
| 151 |
## Retraining and Integration in a simple example:
|