Research on Parallel Acceleration for Deep Learning Inference Based on Many-Core ARM Platform

Zhu, Keqian; Jiang, Jingfei

doi:10.1007/978-981-13-2423-9_3

Keqian Zhu¹⁰ &
Jingfei Jiang¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 908))

Included in the following conference series:

Conference on Advanced Computer Architecture

833 Accesses

Abstract

Deep learning is one of the hottest research directions in the field of artificial intelligence. It has achieved results which subvert these of traditional methods. However, the demand for computing ability of hardware platform is also increasing. The academia and industry mainly use heterogeneous GPUs to accelerating computation. ARM is relatively more open than GPUs. The purpose of this paper is to study the performance and related acceleration techniques of ThunderX high-performance many-core ARM chips under large-scale inference tasks. In order to study the computational performance of the target platform objectively, several deep models are adapted for acceleration. Through the selection of computational libraries, adjustment of parallel strategies, application of various performance optimization techniques, we have excavated the computing ability of many-core ARM platforms deeply. The final experimental results show that the performance of single-chip ThunderX is equivalent to that of the i7 7700 K chip, and the overall performance of dual-chip can reach 1.77 times that of the latter. In terms of energy efficiency, the former is inferior to the latter. Stronger cooling system or bad power management may lead to more power consumption. Overall, high-performance ARM chips can be deployed in the cloud to complete large-scale deep learning inference tasks which requiring high throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: CVPR (2015)
Google Scholar
Lee, V.W., Kim, C., Chhugani, J., et al.: Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU. ACM SIGARCH Comput. Arch. News 38(3), 451–460 (2010)
Article Google Scholar
Ni, H.: Ncnn: a high-performance neural network inference framework optimized for the mobile platform (2017). https://github.com/Tencent/ncnn
Rungsuptaweekoon, K., Visoottiviseth, V., Takano, R.: Evaluating the power efficiency of deep learning inference on embedded GPU systems. In: 2nd International Conference on Information Technology (INCIT) 2017, pp. 1–5. IEEE (2017)
Google Scholar
Zhu, K., Jiang, J.: DSP based acceleration for long short-term memory model based word prediction application. In: 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 93–99. IEEE (2017)
Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)
Jin, R., Dou, Y., Wang, Y., et al.: Confusion graph: detecting confusion communities in large scale image classification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1980–1986. AAAI Press (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, National University of Defense Technology, Changsha, China
Keqian Zhu & Jingfei Jiang

Authors

Keqian Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jingfei Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingfei Jiang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Chao Li
National University of Defense Technology, Changsha, China
Junjie Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, K., Jiang, J. (2018). Research on Parallel Acceleration for Deep Learning Inference Based on Many-Core ARM Platform. In: Li, C., Wu, J. (eds) Advanced Computer Architecture. ACA 2018. Communications in Computer and Information Science, vol 908. Springer, Singapore. https://doi.org/10.1007/978-981-13-2423-9_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-2423-9_3
Published: 13 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2422-2
Online ISBN: 978-981-13-2423-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)