Skip to main content

Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression

  • Conference paper
  • First Online:
Parallel Problem Solving from Nature – PPSN XVI (PPSN 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12269))

Included in the following conference series:

Abstract

Layer-wise magnitude-based pruning is a popular method for Deep Neural Network (DNN) compression. It has the potential to reduce the latency for an inference made by a DNN by pruning connects in the network, which prompts the application of DNNs to tasks with real-time operation requirements, such as self-driving vehicles, video detection and tracking. However, previous methods mainly use the compression rate as a proxy for the latency, without explicitly accounting for latency in the training of the compressed network. This paper presents a new layer-wise magnitude-based pruning method, namely Multi-objective Magnitude-based Latency-Aware Pruning (MMLAP). MMLAP captures latency directly and incorporates a novel multi-objective evolutionary algorithm to optimize both accuracy of a DNN and its latency efficiency when designing compressed networks, i.e., when tuning hyper-parameters of LMP. Empirical studies show the competitiveness of MMLAP compared to well-established LMP methods and show the value of multi-objective optimization in yielding Pareto-optimal compressed networks in terms of accuracy and latency.

This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFB1003102, the Natural Science Foundation of China under Grant 61672478 and Grant 61806090, the Guangdong Provincial Key Laboratory under Grant 2020B121201001, the Shenzhen Peacock Plan under Grant KQTD2016112514355531, the Guangdong-Hong Kong-Macao Greater Bay Area Center for Brain Science and Brain-Inspired Intelligence Fund (NO. 2019028), and the National Leading Youth Talent Support Program of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Beume, N., Naujoks, B., Emmerich, M.T.M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)

    Article  Google Scholar 

  2. Chen, C., Tung, F., Vedula, N., Mori, G.: Constraint-aware deep neural network compression. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11212, pp. 409–424. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01237-3_25

    Chapter  Google Scholar 

  3. Ciaparrone, G., Sánchez, F.L., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020)

    Article  Google Scholar 

  4. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  5. Dong, J.-D., Cheng, A.-C., Juan, D.-C., Wei, W., Sun, M.: DPP-Net: device-aware progressive search for pareto-optimal neural architectures. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 540–555. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_32

    Chapter  Google Scholar 

  6. Dong, X., Chen, S., Pan, S.J.: Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: Advances in Neural Information Processing Systems 30, Long Beach, CA, pp. 4857–4867 (2017)

    Google Scholar 

  7. Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)

    Article  Google Scholar 

  8. Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient DNNs. In: Advances in Neural Information Processing Systems 29, Barcelona, Spain, pp. 1379–1387 (2016)

    Google Scholar 

  9. Han, S., Mao, H., Dally, W.J.: Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding (2015). arXiv preprint arXiv:1510.00149

  10. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems 28, Montreal, Quebec, Canada, pp. 1135–1143 (2015)

    Google Scholar 

  11. Hong, W., Tang, K.: Convex hull-based multi-objective evolutionary computation for maximizing receiver operating characteristics performance. Memetic Comput. 8(1), 35–44 (2015). https://doi.org/10.1007/s12293-015-0176-8

    Article  Google Scholar 

  12. Hong, W., Tang, K., Zhou, A., Ishibuchi, H., Yao, X.: A scalable indicator-based evolutionary algorithm for large-scale multiobjective optimization. IEEE Trans. Evol. Comput. 23(3), 525–537 (2019)

    Article  Google Scholar 

  13. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, pp. 2261–2269 (2017)

    Google Scholar 

  14. Huang, P., He, X., Gao, J., Deng, L., Acero, A., Heck, L.P.: Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM International Conference on Information and Knowledge Management, San Francisco, CA, pp. 2333–2338 (2013)

    Google Scholar 

  15. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, Orlando, FL, pp. 675–678 (2014)

    Google Scholar 

  16. Kim, J., Misu, T., Chen, Y., Tawari, A., Canny, J.F.: Grounding human-to-vehicle advice for self-driving vehicles. In: IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 10591–10599 (2019)

    Google Scholar 

  17. LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems 2, Colorado, USA, pp. 598–605 (1989)

    Google Scholar 

  18. Li, G., Qian, C., Jiang, C., Lu, X., Tang, K.: Optimization based layer-wise magnitude-based pruning for DNN compression. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 2383–2389 (2018)

    Google Scholar 

  19. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: 5th International Conference on Learning Representations, Toulon, France (2017)

    Google Scholar 

  20. Marculescu, D., Stamoulis, D., Cai, E.: Hardware-aware machine learning: modeling and optimization. In: Proceedings of the International Conference on Computer-Aided Design, San Diego, CA, p. 137 (2018)

    Google Scholar 

  21. Molchanov, D., Ashukha, A., Vetrov, D.P.: Variational dropout sparsifies deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2498–2507 (2017)

    Google Scholar 

  22. Qi, H., Sparks, E.R., Talwalkar, A.: Paleo: a performance model for deep neural networks. In: 5th International Conference on Learning Representations, Toulon, France (2017)

    Google Scholar 

  23. Rakshit, P., Konar, A., Das, S.: Noisy evolutionary optimization algorithms - a comprehensive survey. Swarm Evol. Comput. 33, 18–45 (2017)

    Article  Google Scholar 

  24. Real, E., et al.: Large-scale evolution of image classifiers. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, pp. 2902–2911 (2017)

    Google Scholar 

  25. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 4510–4520 (2018)

    Google Scholar 

  26. See, A., Luong, M., Manning, C.D.: Compression of neural machine translation models via pruning. In: Proceedings of the 20th Conference on Computational Natural Language Learning, Berlin, Germany, pp. 291–301 (2016)

    Google Scholar 

  27. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Google Scholar 

  28. Sun, Y., Wang, X., Tang, X.: Sparsifying neural network connections for face recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 4856–4864 (2016)

    Google Scholar 

  29. Tang, K., Yang, P., Yao, X.: Negatively correlated search. IEEE J. Sel. Areas Commun. 34(3), 542–550 (2016)

    Article  Google Scholar 

  30. Ullrich, K., Meeds, E., Welling, M.: Soft weight-sharing for neural network compression. In: 5th International Conference on Learning Representations, Toulon, France (2017)

    Google Scholar 

  31. Wang, E., et al.: Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52(2), 40:1–40:39 (2019)

    Google Scholar 

  32. Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, pp. 9194–9203 (2018)

    Google Scholar 

  33. Zhang, H., Sun, J., Liu, T., Zhang, K., Zhang, Q.: Balancing exploration and exploitation in multiobjective evolutionary optimization. Inf. Sci. 497, 129–148 (2019)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hong, W., Yang, P., Wang, Y., Tang, K. (2020). Multi-objective Magnitude-Based Pruning for Latency-Aware Deep Neural Network Compression. In: Bäck, T., et al. Parallel Problem Solving from Nature – PPSN XVI. PPSN 2020. Lecture Notes in Computer Science(), vol 12269. Springer, Cham. https://doi.org/10.1007/978-3-030-58112-1_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58112-1_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58111-4

  • Online ISBN: 978-3-030-58112-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics