Abstract
Layer-wise magnitude-based pruning is a popular method for Deep Neural Network (DNN) compression. It has the potential to reduce the latency for an inference made by a DNN by pruning connects in the network, which prompts the application of DNNs to tasks with real-time operation requirements, such as self-driving vehicles, video detection and tracking. However, previous methods mainly use the compression rate as a proxy for the latency, without explicitly accounting for latency in the training of the compressed network. This paper presents a new layer-wise magnitude-based pruning method, namely Multi-objective Magnitude-based Latency-Aware Pruning (MMLAP). MMLAP captures latency directly and incorporates a novel multi-objective evolutionary algorithm to optimize both accuracy of a DNN and its latency efficiency when designing compressed networks, i.e., when tuning hyper-parameters of LMP. Empirical studies show the competitiveness of MMLAP compared to well-established LMP methods and show the value of multi-objective optimization in yielding Pareto-optimal compressed networks in terms of accuracy and latency.
This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB1003102, the Natural Science Foundation of China under Grant 61672478 and Grant 61806090, the Guangdong Provincial Key Laboratory under Grant 2020B121201001, the Shenzhen Peacock Plan under Grant KQTD2016112514355531, the Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence Fund (NO. 2019028), and the National Leading Youth Talent Support Program of China.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beume, N., Naujoks, B., Emmerich, M.T.M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)
Chen, C., Tung, F., Vedula, N., Mori, G.: Constraint-aware deep neural network compression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 409–424. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_25
Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: DPP-Net: device-aware progressive search for pareto-optimal neural architectures. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 540–555. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_32
Dong, X., Chen, S., Pan, S.J.: Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, pp. 4857–4867 (2017)
Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances in Neural Information Processing Systems 29, Barcelona, Spain, pp. 1379–1387 (2016)
Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding (2015). arXiv preprint arXiv:1510.00149
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28, Montreal, Quebec, Canada, pp. 1135–1143 (2015)
Hong, W., Tang, K.: Convex hull-based multi-objective evolutionary computation for maximizing receiver operating characteristics performance. Memetic Comput. 8(1), 35–44 (2015). https://doi.org/10.1007/s12293-015-0176-8
Hong, W., Tang, K., Zhou, A., Ishibuchi, H., Yao, X.: A scalable indicator-based evolutionary algorithm for large-scale multiobjective optimization. IEEE Trans. Evol. Comput. 23(3), 525–537 (2019)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, pp. 2261–2269 (2017)
Huang, P., He, X., Gao, J., Deng, L., Acero, A., Heck, L.P.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM International Conference on Information and Knowledge Management, San Francisco, CA, pp. 2333–2338 (2013)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, Orlando, FL, pp. 675–678 (2014)
Kim, J., Misu, T., Chen, Y., Tawari, A., Canny, J.F.: Grounding human-to-vehicle advice for self-driving vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 10591–10599 (2019)
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems 2, Colorado, USA, pp. 598–605 (1989)
Li, G., Qian, C., Jiang, C., Lu, X., Tang, K.: Optimization based layer-wise magnitude-based pruning for DNN compression. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2383–2389 (2018)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Marculescu, D., Stamoulis, D., Cai, E.: Hardware-aware machine learning: modeling and optimization. In: Proceedings of the International Conference on Computer-Aided Design, San Diego, CA, p. 137 (2018)
Molchanov, D., Ashukha, A., Vetrov, D.P.: Variational dropout sparsifies deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2498–2507 (2017)
Qi, H., Sparks, E.R., Talwalkar, A.: Paleo: a performance model for deep neural networks. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Rakshit, P., Konar, A., Das, S.: Noisy evolutionary optimization algorithms - a comprehensive survey. Swarm Evol. Comput. 33, 18–45 (2017)
Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2902–2911 (2017)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 4510–4520 (2018)
See, A., Luong, M., Manning, C.D.: Compression of neural machine translation models via pruning. In: Proceedings of the 20th Conference on Computational Natural Language Learning, Berlin, Germany, pp. 291–301 (2016)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Sun, Y., Wang, X., Tang, X.: Sparsifying neural network connections for face recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 4856–4864 (2016)
Tang, K., Yang, P., Yao, X.: Negatively correlated search. IEEE J. Sel. Areas Commun. 34(3), 542–550 (2016)
Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations, Toulon, France (2017)
Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 40:1–40:39 (2019)
Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 9194–9203 (2018)
Zhang, H., Sun, J., Liu, T., Zhang, K., Zhang, Q.: Balancing exploration and exploitation in multiobjective evolutionary optimization. Inf. Sci. 497, 129–148 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Hong, W., Yang, P., Wang, Y., Tang, K. (2020). Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression. In: Bäck, T., et al. Parallel Problem Solving from Nature – PPSN XVI. PPSN 2020. Lecture Notes in Computer Science(), vol 12269. Springer, Cham. https://doi.org/10.1007/978-3-030-58112-1_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-58112-1_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58111-4
Online ISBN: 978-3-030-58112-1
eBook Packages: Computer ScienceComputer Science (R0)