ResNets family is a well known architecture that uses skip connections to enable stronger gradients in much deeper networks. This variant has 50 layers.
The model is quantized in int8 using tensorflow lite converter. A mixed precision version is also provided using onnx-runtime and our own quantization scripts.
The models are quantized using tensorflow lite converter.
Network inputs / outputs
For an image resolution of NxM and P classes
Input Shape
Description
(1, N, M, 3)
Single NxM RGB image with UINT8 values between 0 and 255
Output Shape
Description
(1, P)
Per-class confidence for P classes in FLOAT32
Recommended platforms
Platform
Supported
Recommended
STM32L0
[]
[]
STM32L4
[]
[]
STM32U5
[]
[]
STM32H7
[x]
[]
STM32MP1
[x]
[]
STM32MP2
[x]
[x]
STM32N6
[x]
[x]
Performances
Metrics
Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
tfs stands for "training from scratch", meaning that the model weights were randomly initialized before training.
tl stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
fft stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.
Reference NPU memory footprint on food101 and imagenet dataset (see Accuracy for details on dataset)
Dataset details: link, Quotation[4].
Number of classes: 1000.
To perform the quantization, we calibrated the activations with a random subset of the training set.
For the sake of simplicity, the accuracy reported here was estimated on the 50000 labelled images of the validation set.
Please refer to the stm32ai-modelzoo-services GitHub here
References
[1]
L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.