Accelerating Deep Convolutional Neural Network Inference Based on OpenCL

Wu, Yong; Zhu, Huming; Zhang, Lingyun; Hou, Biao; Jiao, Licheng

doi:10.1007/978-3-031-14903-0_11

Yong Wu¹⁸,
Huming Zhu¹⁹,
Lingyun Zhang¹⁹,
Biao Hou¹⁹ &
…
Licheng Jiao¹⁹

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Included in the following conference series:

International Conference on Intelligence Science

1009 Accesses
2 Citations

Abstract

In recent years, in order to facilitate the efficient application of deep convolutional neural networks, it has become increasingly important to accelerate the inference stage of deep convolutional neural networks. But with the development of numerous heterogeneous computing devices, today’s popular deep learning inference tools only support specific devices, so they cannot effectively utilize different GPU devices to accelerate DNN inference. To address this issue, we propose an OpenCL-based parallel deep convolutional neural network inference algorithms. Firstly, we design and implement parallel kernel code using OpenCL to accelerate depthwise separable convolution, and implement parallel matrix multiplication combined with clBLAS to accelerate traditional convolution. Meanwhile, we design OpenCL parallel kernel codes for other operations in the inference stage of deep convolutional neural networks. Secondly, we further improve the inference performance by means of kernel fusion and increasing the workload per core. Finally, MobileNet v1 network and the 21-layer residual network based on OpenCL are run on AMD Radeon Vega Frontier GPU and Nvidia GeForce GTX 1070 GPU. Compared to the Caffe implementation, 40.16x, 1.67x speedups are achieved on the AMD GPU and 14.95x, 1.11x speedups are achieved on the Nvidia GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Guo P.: Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. IEEE, Piscataway, NJ (2021)
Google Scholar
Wang J.: End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 15849–15858. IEEE, Piscataway, NJ (2021)
Google Scholar
Das A.: Enabling on-device smartphone GPU based training: lessons learned. In: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 533–538. IEEE, Piscataway, NJ (2022)
Google Scholar
Kim, S.: Performance evaluation of INT8 quantized inference on mobile GPUs. IEEE Access 9, 164245–164255 (2021)
Article Google Scholar
Wai, Y.J.: Fixed point implementation of Tiny-Yolo-v2 using OpenCL on FPGA. Int. J. Adv. Comput. Sci. Appl. 9(10), 506–512 (2018)
Google Scholar
Mu, J.: Optimizing Opencl-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 13(3), 1–28 (2020)
Article Google Scholar
Koo, Y., Kim, S., Ha, Y.-G.: OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework. World Wide Web 24(4), 1299–1319 (2020). https://doi.org/10.1007/s11280-020-00778-y
Article Google Scholar
Dagli, R., Eken, S.: Deploying a smart queuing system on edge with Intel OpenVINO toolkit. Soft. Comput. 25(15), 10103–10115 (2021). https://doi.org/10.1007/s00500-021-05891-2
Article Google Scholar
Marco, V.S.: Optimizing deep learning inference on embedded systems through adaptive model selection. ACM Trans. Embed. Comput. Syst. 19(1), 1–28 (2020)
Article Google Scholar
Dua A.: Systolic-CNN: an OpenCL-defined scalable run-time-flexible FPGA accelerator architecture for accelerating convolutional neural network inference in cloud/edge computing. In: Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), p. 231. IEEE, Piscataway, NJ (2020)
Google Scholar
Lin, D.L.: Accelerating large sparse neural network inference using GPU task graph parallelism. IEEE Trans. Parallel Distrib. Syst. 33(11), 3041–3052 (2021)
Google Scholar
He, S.: An efficient GPU-accelerated inference engine for binary neural network on mobile phones. J. Syst. Architect. 117, 102156 (2021)
Article Google Scholar
Chen, J.: Split convolutional neural networks for distributed inference on concurrent IoT sensors. In: International Conference on Parallel and Distributed Systems (ICPADS), pp. 66–73. IEEE, Piscataway, NJ (2021)
Google Scholar
Howard A.G.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
He, K.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Piscataway, NJ (2016)
Google Scholar
Qin Z.: Diagonal wise refactorization: an efficient training method for depthwise convolutions. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 770–778. IEEE, Piscataway, NJ (2016)
Google Scholar

Download references

Funding

This work is funded in part by the Key Research and Development Program of Shaanxi (Program No. 2022ZDLGY01-09), GHfund A (No. 202107014474) GHfund C (No. 202202036165), Wuhu and Xidian University special fund for industry-university-research cooperation (Project No. XWYCXY-012021013), and Cloud Computing Key Laboratory of Gansu Province.

Author information

Authors and Affiliations

Xidian-Wuhu Research Institute, Wuhu, China
Yong Wu
Key Laboratory of Intelligent Perception and Image Understanding, Ministry of Education, Xidian University, Xi’an, 710071, China
Huming Zhu, Lingyun Zhang, Biao Hou & Licheng Jiao

Authors

Yong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huming Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lingyun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Biao Hou
View author publications
You can also search for this author in PubMed Google Scholar
Licheng Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huming Zhu .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
Department of Computer Science, University of Surrey, Guildford, UK
Yaochu Jin
College of Artificial Intelligence, Xidian University, Xi’an, China
Xiangrong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Zhu, H., Zhang, L., Hou, B., Jiao, L. (2022). Accelerating Deep Convolutional Neural Network Inference Based on OpenCL. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-14903-0_11
Published: 19 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14902-3
Online ISBN: 978-3-031-14903-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)