Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Berthelier, Anthony; Chateau, Thierry; Duffner, Stefan; Garcia, Christophe; Blanc, Christophe

doi:10.1007/s11265-020-01596-1

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Published: 12 October 2020

Volume 93, pages 863–878, (2021)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Anthony Berthelier ORCID: orcid.org/0000-0003-0545-0345¹,
Thierry Chateau¹,
Stefan Duffner²,
Christophe Garcia² &
…
Christophe Blanc¹

2172 Accesses
35 Citations
Explore all metrics

Abstract

Over the past, deep neural networks have proved to be an essential element for developing intelligent solutions. They have achieved remarkable performances at a cost of deeper layers and millions of parameters. Therefore utilising these networks on limited resource platforms for smart cameras is a challenging task. In this context, models need to be (i) accelerated and (ii) memory efficient without significantly compromising on performance. Numerous works have been done to obtain smaller, faster and accurate models. This paper presents a survey of methods suitable for porting deep neural networks on resource-limited devices, especially for smart cameras. These methods can be roughly divided in two main sections. In the first part, we present compression techniques. These techniques are categorized into: knowledge distillation, pruning, quantization, hashing, reduction of numerical precision and binarization. In the second part, we focus on architecture optimization. We introduce the methods to enhance networks structures as well as neural architecture search techniques. In each of their parts, we describe different methods, and analyse them. Finally, we conclude this paper with a discussion on these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compression of Deep Neural Networks on the Fly

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

Article 20 July 2019

References

Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks, NIPS.
LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Google Scholar
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale Image recognition. arXiv:1409.1556v6.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. In IEEE conference on computer vision and pattern recognition (pp. 1–9).
He, K., & Sun, J. (2015). Convolutional neural networks at constrained time cost, 5353–5360.
Chuangxia, H., Hanfeng, K., Xiaohong, C., Fenghua, W. (2013). An lmi approach for dynamics of switched cellular neural networks with mixed delays, Abstract and Applied Analysis.
Chuangxia, H., Jie, C., Peng, W. (2016). Attractor and boundedness of switched stochastic cohen-grossberg neural networks, Discrete Dynamics in Nature and Society.
Cheng, Y., Wang, D., Zhou, P., Zhang, T., Member, S. (2018). A Survey of Model Compression and Acceleration for Deep Neural Networks, IEEE Signal Processing Magazine.
Cheng, J., Wang, P., Li, G., Hu, Q., Lu, H. (2018). Recent Advances in Efficient Computation of Deep Convolutional Neural Networks. Frontiers of Information Technology & Electronic Engineering, 19(1), 64–77.
Article Google Scholar
Dauphin, Y.N., & Bengio, Y. (2013). Big neural networks waste capacity.
Ba, J., & Caruana, R. (2014). Do deep nets really need to be deep?, NIPS, 2654–2662.
BuciluÇž, C., Caruana, R., Niculescu-Mizil, A. (2006). Model compression, ACM, 535–541.
Hinton, G., Vinyals, O., Dean, J. (2014). Distilling the Knowledge in a Neural Network. NIPS 2014 Deep Learning Workshop, 14, 1–9.
Google Scholar
Chen, Y., Wang, N., Zhang, Z. (2017). Darkrank: Accelerating deep metric learning via cross sample similarities transfer.
Huang, Z., & Wang, N. (2017). Like what you like: Knowledge distill via neuron selectivity transfer.
Aguilar, G., Ling, Y., Zhang, Y., Yao, B., Fan, X., Guo, C. (2020). Knowledge distillation from internal representations.
Lee, H., Hwang, S.J., Shin, J. (2020). Self-supervised label augmentation via input transformations, ICML.
Müller, R., Kornblith, S., Hinton, G. (2019). When does label smoothing help?. In Advances in Neural Information Processing Systems.
Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J. (2009). Feature hashing for large scale multitask learning. In Proceedings of the 26th annual international conference on machine learning (pp. 1113–1120).
Chen, W., Wilson, J., Tyree, S., Weinberger, K., Chen, Y. (2015). Compressing neural networks with the hashing trick, 2285–2294.
Spring, R., & Shrivastava, A. (2017). Scalable and sustainable deep learning via randomized hashing, ACM, 445–454.
Ba, J., & Frey, B. (2013). Adaptive dropout for training deep neural networks, 3084–3092.
Gionis, A., Indyk, P., Motwani, R. (1999). Similarity Search in High Dimensions via Hashing (pp. 518–529).
Shinde, R., Goel, A., Gupta, P., Dutta, D. (2010). Similarity search and locality sensitive hashing using ternary content addressable memories. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data (pp. 375–386).
Sundaram, N., Turmukhametova, A., Satish, N., Mostak, T., Indyk, P., Madden, S., Dubey, P. (2013). Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. In Proceedings of the VLDB Endowment (pp. 1930–1941).
Huang, Q., Feng, J., Zhang, Y., Fang, Q., Ng, W. (2015). Query-aware locality-sensitive hashing for approximate nearest neighbor search. In Proceedings of the VLDB Endowment (pp. 1–12).
Shrivastava, A., & Li, P. (2014). Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips).
Cun, Y.L., Denker, J.S., Solla, S. (1990). Optimal Brain Damage, Advances in Neural Information Processing Systems. arXiv:1011.1669v3.
Hassibi, B., & Stork, D.G. (1993). Second order derivatives for network pruning: Optimal brain surgeon.
Han, S., Pool, J., Tran, J., Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J. (2017). Pruning convolutional neural networks for resource efficient transfer learning, ICLR.
Anwar, S., Hwang, K., Sung, W. (2017). Structured pruning of deep convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems, 13(3), 1–18.
Article Google Scholar
Zhou, H., Alvarez, J.M., Porikli, F. (2016). Less Is More: Towards Compact CNNs. In Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV Springer International Publishing Cham (pp. 662–677).
Alvarez, J.M., & Salzmann, M. (2016). Learning the number of neurons in deep networks, 2270–2278.
Lebedev, V., & Lempitsky, V. (2016). Fast convnets using group-wise brain damage.
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H. (2016). Learning structured sparsity in deep neural networks. NIPS, 2082–2090.
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P. (2017). Pruning filters for efficient ConvNets. arXiv:1608.08710.
Luo, J.-H., Wu, J., Lin, W. (2017). Thinet: A filter level pruning method for deep neural network compression, ICCV.
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C. (2017). Learning Efficient Convolutional Networks through Network Slimming. arXiv:1708.06519.
Han, S., Mao, H., Dally, W.J. (2016). Deep Compression - Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv:1510.00149.
Yu, R., Li, A., Chen, C.-F., Lai, J.-H., Morariu, V.I., Han, X., Gao, M., Lin, C.-Y., Davis, L.S. (2018). NISP: pruning networks using neuron importance score propagation, CVPR.
Zhuang, Z., Tan, M., Zhuang, B., Liu, J., Guo, Y., Wu, Q., Huang, J., Zhu, J. (2018). Discrimination-aware channel pruning for deep neural networks. In Advances in Neural Information Processing Systems 31 (pp. 875–886).
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y. (2019). Filter pruning via geometric median for deep convolutional neural networks acceleration, CVPR.
Lin, S., Ji, R., Yan, C., Zhang, B., Cao, L., Ye, Q., Huang, F., Doermann, D.S. (2019). Towards optimal structured CNN pruning via generative adversarial learning.
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B. (2013). Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets. In IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6655–6659).
Kolda, T.G, & Bader, B.W. (2009). Tensor decompositions and applications. SIAM Review, 51 (3), 455–500.
Article MathSciNet Google Scholar
Cheng, Y., Wang, D., Zhou, P., Zhang, T. (2017). A survey of model compression and acceleration for deep neural networks, arXiv:1710.09282.
Lin, J., Rao, Y., Lu, J., Zhou, J. (2017). Runtime neural pruning.
Gong, Y., Liu, L., Yang, M., Bourdev, L. (2014). Compressing deep convolutional networks using vector quantization.
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE (pp. 2278–2324).
Choi, Y., El-Khamy, M., Lee, J. (2017). Towards the limit of network quantization, ICLR.
Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization.
Duchi, J., Hazan, E., Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2121–2159.
MathSciNet MATH Google Scholar
Zeiler, M.D. (2012). ADADELTA: An Adaptive Learning Rate Method 6. arXiv:1212.5701.
Hinton, G.E., Srivastava, N., Swersky, K. (2012). Lecture 6a- overview of mini-batch gradient descent.
Abadi, M. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
Ahmad, J., Beers, J., Ciurus, M., Critz, R., Katz, M., Pereira, A., Pringle, M., Rames, J. (2017). ios 11 by tutorials: Learning the new ios apis with swift 4 1 Razeware LLC.
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E., Lecun, Y. (2011). NeuFlow: A runtime reconfigurable dataflow processor for vision IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E. (2014). A 240 G-ops/s mobile coprocessor for deep neural networks. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 696–701.
Vanhoucke, V., Senior, A., Mao, M.Z. (2011). Improving the speed of neural networks on cpus. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
Iwata, A., Yoshida, Y., Matsuda, S., Sato, Y., Suzumura, Y. (1989). An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors. Proc. IJCNN, 18, 171–175.
Google Scholar
Hammerstrom, D. (1990). A VLSI architecture for high-performance, low-cost, on-chip learning. In IJCNN International Joint Conference on Neural Networks (pp. 537–544).
Holt, J.L., & Hwang, J.N. (1993). Finite precision error analysis of neural network hardware implementations. IEEE Transactions on Computers, 42, 281–290.
Article Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P. (2015). Deep Learning with Limited Numerical Precision. In International Conference on Machine Learning. arXiv:1502.02551(pp. 1737–1746).
Courbariaux, M., Bengio, Y., David, J.-P. (2014). Training deep neural networks with low precision multiplications, ICLR.
Williamson, D. (1991). Dynamically scaled fixed point arithmetic. In IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings (pp. 315–318).
Mamalet, F., Roux, S., Garcia, C. (2007). Real-time video convolutional face finder on embedded platforms, Eurasip Journal on Embedded Systems.
Roux, S., Mamalet, F., Garcia, C., Duffner, S. (2007). An embedded robust facial feature detector. In Proceedings of the 2007 IEEE Signal Processing Society Workshop, MLSP (pp. 170–175).
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y. (2016). Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1.
Lin, X., Zhao, C., Pan, W. (2017). Towards accurate binary convolutional neural network.
Srivastava, N. (2013). Improving Neural Networks with Dropout.
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting.
Courbariaux, M., Bengio, Y., David, J.-P. (2015). Binaryconnect: Training deep neural networks with binary weights during propagations.
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A. (2016). Xnor-net: Imagenet classification using binary convolutional neural networks Springer.
Newell, A., Yang, K., Deng, J. (2016). Stacked hourglass networks for human pose estimation Springer.
Bulat, A., & Tzimiropoulos, G. (2017). Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources.
Deng, L., Jiao, P., Pei, J., Wu, Z., Li, G. (2017). Gated xnor networks: Deep neural networks with ternary weights and activations under a unified discretization framework.
LeCun, Y. (1989). Generalization and network design strategies.
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition.
Huang, G., Liu, Z., Weinberger, K.Q., Maaten, L.VD. (2017). Densely connected convolutional networks.
Iandola, F.N, Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size.
Nanfack, G., Elhassouny, A., Thami, R. O.H. (2017). Squeeze-segnet: A new fast deep convolutional neural network for semantic segmentation.
Badrinarayanan, V., Kendall, A., Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence.
Brostow, G., Shotton, J., Fauqueur, J., Cipolla, R. (2008). Segmentation and recognition using structure from motion point clouds.
Mamalet, F., & Garcia, C. (2012). Simplifying convnets for fast learning.
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications.
Chollet, F. (2016). Xception: Deep learning with depthwise separable convolutions. arXiv:1610.02357.
Sifre, L., & Stephane, M. (2014). Rigid-Motion Scattering For Image Classification.
Zhang, X., Zhou, X., Lin, M., Sun, J. (2017). Shufflenet: An extremely efficient convolutional neural network for mobile devices.
Rosenblatt, F. (1962). Perceptrons and the Theory of Brain Mechanics.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43(1), 59–69.
Article MathSciNet Google Scholar
Willshaw, D.J., & Von Der Malsburg, C. (1976). How patterned neural connections can be set up by self-organization. Proceedings of the Royal Society of London. Series B, Biological Sciences, 194(1117), 431–445. http://www.jstor.org/stable/77138 .
Google Scholar
Martinetz, T.M., Berkovich, S.G., Schulten, K.J. (1993). Neural-Gas Network for Vector Quantization and its Application to Time-Series Prediction, 4, 4.
Fritzke, B. (1995). A Growing Neural Gas Learns Topologies. Advances in Neural Information Processing Systems, 7, 625–632.
Google Scholar
Fritzke, B. (1994). Supervised Learning with Growing Cell Structures.
Fritzke, B., & Bochum, R-. (1994). Fast learning with incremental RBF Networks 1 Introduction 2 Model description. Processing, 1(1), 2–5.
Google Scholar
Fritzke, B. (1994). Growing cell structures-A self-organizing network for unsupervised and supervised learning. Neural Networks, 7(9), 1441–1460.
Article Google Scholar
Montana, D.J., & Davis, L. (1989). Training feedforward neural networks using genetic algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence (pp. 762–767).
Floreano, D., Dürr, P., Mattiussi, C. (2008). Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1(1), 47–62.
Article Google Scholar
Stanley, K.O., & Miikkulainen, R. (2002). Evolving neural networks through augmenting topologies, Evolutionary Computation.
Radcliffe, N.J. (1993). Genetic set recombination and its application to neural network topology optimisation. Neural Computing & Applications, 1(1), 67–90.
Article Google Scholar
Thierens, D. (1996). Non-redundant genetic coding of neural networks.
Miikkulainen, R., Liang, J., Meyerson, E., Rawal, A., Fink, D., Francon, O., Raju, B., Shahrzad, H., Navruzyan, A., Duffy, N., Hodjat, B. (2017). Evolving Deep Neural Networks.
Elsken, T., Metzen, J.H., Hutter, F. (2018). Simple and efficient architecture search for convolutional neural networks. https://openreview.net/forum?id=SySaJ0xCZ,.
Cai, H., Chen, T., Zhang, W., Yu, Y., Wang, J. (2018). Efficient Architecture Search by Network Transformation.
Jin, H., Song, Q., & Hu, X. (2018). Efficient Neural Architecture Search with Network Morphism.
Cai, H., Yang, J., Zhang, W., Han, S., Yu, Y. (2018). Path-level network transformation for efficient architecture search. In Proceedings of the 35th International Conference on Machine Learning (pp. 678–687).
Saxena, S., & Verbeek, J. (2016). Convolutional Neural Fabrics.
Pham, H., Guan, M., Zoph, B., Le, Q., Dean, J. (2018). Efficient neural architecture search via parameters sharing. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80.
Veniat, T., & Denoyer, L. (2018). Learning time/memory-efficient deep architectures with budgeted super networks. In Conference on Computer Vision and Pattern Recognition (pp. 3492–3500).
Zoph, B., Yuret, D., May, J., Knight, K. (2016). Transfer Learning for Low-Resource Neural Machine Translation.
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V. (2019). MnasNet: Platform-Aware Neural Architecture Search for Mobile.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C. (2018). Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation.
Frankle, J., & Carbin, M. (2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR.

Download references

Acknowledgments

This work has been sponsored by the Auvergne Regional Council and the European funds of regional development (FEDER).

Author information

Authors and Affiliations

Institut Pascal, 4 Avenue Blaise Pascal, 63178, Aubiere, France
Anthony Berthelier, Thierry Chateau & Christophe Blanc
LIRIS - 20, Avenue Albert Einstein, 69621, Villeurbanne Cedex, France
Stefan Duffner & Christophe Garcia

Authors

Anthony Berthelier
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Chateau
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Duffner
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Blanc
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Berthelier.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berthelier, A., Chateau, T., Duffner, S. et al. Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey. J Sign Process Syst 93, 863–878 (2021). https://doi.org/10.1007/s11265-020-01596-1

Download citation

Received: 16 April 2020
Revised: 07 August 2020
Accepted: 03 September 2020
Published: 12 October 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11265-020-01596-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Abstract

Access this article

Similar content being viewed by others

Compression of Deep Neural Networks on the Fly

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Model Compression and Architecture Optimization for Embedded Systems: A Survey

Abstract

Access this article

Similar content being viewed by others

Compression of Deep Neural Networks on the Fly

Research on Deep Neural Network Model Compression Based on Quantification Pruning and Huffmann Encoding

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation