Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression

Hong, Wenjing; Yang, Peng; Wang, Yiwen; Tang, Ke

doi:10.1007/978-3-030-58112-1_32

Wenjing Hong^15,16,17,
Peng Yang¹⁵,
Yiwen Wang¹⁸ &
…
Ke Tang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12269))

Included in the following conference series:

International Conference on Parallel Problem Solving from Nature

2358 Accesses
4 Citations

Abstract

Layer-wise magnitude-based pruning is a popular method for Deep Neural Network (DNN) compression. It has the potential to reduce the latency for an inference made by a DNN by pruning connects in the network, which prompts the application of DNNs to tasks with real-time operation requirements, such as self-driving vehicles, video detection and tracking. However, previous methods mainly use the compression rate as a proxy for the latency, without explicitly accounting for latency in the training of the compressed network. This paper presents a new layer-wise magnitude-based pruning method, namely Multi-objective Magnitude-based Latency-Aware Pruning (MMLAP). MMLAP captures latency directly and incorporates a novel multi-objective evolutionary algorithm to optimize both accuracy of a DNN and its latency efficiency when designing compressed networks, i.e., when tuning hyper-parameters of LMP. Empirical studies show the competitiveness of MMLAP compared to well-established LMP methods and show the value of multi-objective optimization in yielding Pareto-optimal compressed networks in terms of accuracy and latency.

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB1003102, the Natural Science Foundation of China under Grant 61672478 and Grant 61806090, the Guangdong Provincial Key Laboratory under Grant 2020B121201001, the Shenzhen Peacock Plan under Grant KQTD2016112514355531, the Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence Fund (NO. 2019028), and the National Leading Youth Talent Support Program of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-objective Pruning for CNNs Using Genetic Algorithm

Compression of deep neural networks: bridging the gap between conventional-based pruning and evolutionary approach

Article Open access 09 June 2022

Hardware-Aware Evolutionary Filter Pruning

References

Beume, N., Naujoks, B., Emmerich, M.T.M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)
Article Google Scholar
Chen, C., Tung, F., Vedula, N., Mori, G.: Constraint-aware deep neural network compression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 409–424. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_25
Chapter Google Scholar
Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020)
Article Google Scholar
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: DPP-Net: device-aware progressive search for pareto-optimal neural architectures. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 540–555. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_32
Chapter Google Scholar
Dong, X., Chen, S., Pan, S.J.: Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, pp. 4857–4867 (2017)
Google Scholar
Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
Article Google Scholar
Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances in Neural Information Processing Systems 29, Barcelona, Spain, pp. 1379–1387 (2016)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding (2015). arXiv preprint arXiv:1510.00149
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28, Montreal, Quebec, Canada, pp. 1135–1143 (2015)
Google Scholar
Hong, W., Tang, K.: Convex hull-based multi-objective evolutionary computation for maximizing receiver operating characteristics performance. Memetic Comput. 8(1), 35–44 (2015). https://doi.org/10.1007/s12293-015-0176-8
Article Google Scholar
Hong, W., Tang, K., Zhou, A., Ishibuchi, H., Yao, X.: A scalable indicator-based evolutionary algorithm for large-scale multiobjective optimization. IEEE Trans. Evol. Comput. 23(3), 525–537 (2019)
Article Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, pp. 2261–2269 (2017)
Google Scholar
Huang, P., He, X., Gao, J., Deng, L., Acero, A., Heck, L.P.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM International Conference on Information and Knowledge Management, San Francisco, CA, pp. 2333–2338 (2013)
Google Scholar
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, Orlando, FL, pp. 675–678 (2014)
Google Scholar
Kim, J., Misu, T., Chen, Y., Tawari, A., Canny, J.F.: Grounding human-to-vehicle advice for self-driving vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 10591–10599 (2019)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems 2, Colorado, USA, pp. 598–605 (1989)
Google Scholar
Li, G., Qian, C., Jiang, C., Lu, X., Tang, K.: Optimization based layer-wise magnitude-based pruning for DNN compression. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2383–2389 (2018)
Google Scholar
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Google Scholar
Marculescu, D., Stamoulis, D., Cai, E.: Hardware-aware machine learning: modeling and optimization. In: Proceedings of the International Conference on Computer-Aided Design, San Diego, CA, p. 137 (2018)
Google Scholar
Molchanov, D., Ashukha, A., Vetrov, D.P.: Variational dropout sparsifies deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2498–2507 (2017)
Google Scholar
Qi, H., Sparks, E.R., Talwalkar, A.: Paleo: a performance model for deep neural networks. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Google Scholar
Rakshit, P., Konar, A., Das, S.: Noisy evolutionary optimization algorithms - a comprehensive survey. Swarm Evol. Comput. 33, 18–45 (2017)
Article Google Scholar
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2902–2911 (2017)
Google Scholar
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 4510–4520 (2018)
Google Scholar
See, A., Luong, M., Manning, C.D.: Compression of neural machine translation models via pruning. In: Proceedings of the 20th Conference on Computational Natural Language Learning, Berlin, Germany, pp. 291–301 (2016)
Google Scholar
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Sparsifying neural network connections for face recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 4856–4864 (2016)
Google Scholar
Tang, K., Yang, P., Yao, X.: Negatively correlated search. IEEE J. Sel. Areas Commun. 34(3), 542–550 (2016)
Article Google Scholar
Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Google Scholar
Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 40:1–40:39 (2019)
Google Scholar
Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 9194–9203 (2018)
Google Scholar
Zhang, H., Sun, J., Liu, T., Zhang, K., Zhang, Q.: Balancing exploration and exploitation in multiobjective evolutionary optimization. Inf. Sci. 497, 129–148 (2019)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Guangdong Provincial Key Laboratory of Brain-Inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
Wenjing Hong, Peng Yang & Ke Tang
Department of Management Science, University of Science and Technology of China, Hefei, 230027, China
Wenjing Hong
Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence, Guangzhou, 510515, China
Wenjing Hong
Department of Electronic and Computer Engineering, Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China
Yiwen Wang

Authors

Wenjing Hong
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ke Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Tang .

Editor information

Editors and Affiliations

Leiden University, Leiden, The Netherlands
Thomas Bäck
Leiden University, Leiden, The Netherlands
Mike Preuss
Leiden University, Leiden, The Netherlands
André Deutz
Sorbonne University, Paris, France
Hao Wang
Sorbonne University, Paris, France
Carola Doerr
Leiden University, Leiden, The Netherlands
Michael Emmerich
University of Münster, Münster, Germany
Heike Trautmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, W., Yang, P., Wang, Y., Tang, K. (2020). Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression. In: Bäck, T., et al. Parallel Problem Solving from Nature – PPSN XVI. PPSN 2020. Lecture Notes in Computer Science(), vol 12269. Springer, Cham. https://doi.org/10.1007/978-3-030-58112-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-58112-1_32
Published: 31 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58111-4
Online ISBN: 978-3-030-58112-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression

Abstract

Access this chapter

Similar content being viewed by others

Multi-objective Pruning for CNNs Using Genetic Algorithm

Compression of deep neural networks: bridging the gap between conventional-based pruning and evolutionary approach

Hardware-Aware Evolutionary Filter Pruning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression

Abstract

Access this chapter

Similar content being viewed by others

Multi-objective Pruning for CNNs Using Genetic Algorithm

Compression of deep neural networks: bridging the gap between conventional-based pruning and evolutionary approach

Hardware-Aware Evolutionary Filter Pruning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation