Abstract:
This project deals with the optimization of Deep Neural Networks for efficientembedded inference. Network Pruning and Quantization techniques are implemented underthe PyTorch environment and benchmarked on ResNet50. The obtained results, consisting ofcompression and speed-up rates, successfully validate the feasibility and the effectiveness of theconcept. To show their practical potential, the two schemes have been applied on RetinaNetobject detector. Additionally, this work demonstrates that inference can be performed at theedge by reducing the model’s memory footprint and the processing time, resulting in reducedlatency and energy consumption as well as improved data security. Hence, new horizons ofapplications in embedded systems are opened up