Skip to main content

Accelerating Deep Convolutional Neural Network Inference Based on OpenCL

  • Conference paper
  • First Online:
Intelligence Science IV (ICIS 2022)

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Included in the following conference series:

Abstract

In recent years, in order to facilitate the efficient application of deep convolutional neural networks, it has become increasingly important to accelerate the inference stage of deep convolutional neural networks. But with the development of numerous heterogeneous computing devices, today’s popular deep learning inference tools only support specific devices, so they cannot effectively utilize different GPU devices to accelerate DNN inference. To address this issue, we propose an OpenCL-based parallel deep convolutional neural network inference algorithms. Firstly, we design and implement parallel kernel code using OpenCL to accelerate depthwise separable convolution, and implement parallel matrix multiplication combined with clBLAS to accelerate traditional convolution. Meanwhile, we design OpenCL parallel kernel codes for other operations in the inference stage of deep convolutional neural networks. Secondly, we further improve the inference performance by means of kernel fusion and increasing the workload per core. Finally, MobileNet v1 network and the 21-layer residual network based on OpenCL are run on AMD Radeon Vega Frontier GPU and Nvidia GeForce GTX 1070 GPU. Compared to the Caffe implementation, 40.16x, 1.67x speedups are achieved on the AMD GPU and 14.95x, 1.11x speedups are achieved on the Nvidia GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guo P.: Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. IEEE, Piscataway, NJ (2021)

    Google Scholar 

  2. Wang J.: End-to-end object detection with fully convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 15849–15858. IEEE, Piscataway, NJ (2021)

    Google Scholar 

  3. Das A.: Enabling on-device smartphone GPU based training: lessons learned. In: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 533–538. IEEE, Piscataway, NJ (2022)

    Google Scholar 

  4. Kim, S.: Performance evaluation of INT8 quantized inference on mobile GPUs. IEEE Access 9, 164245–164255 (2021)

    Article  Google Scholar 

  5. Wai, Y.J.: Fixed point implementation of Tiny-Yolo-v2 using OpenCL on FPGA. Int. J. Adv. Comput. Sci. Appl. 9(10), 506–512 (2018)

    Google Scholar 

  6. Mu, J.: Optimizing Opencl-Based CNN design on FPGA with comprehensive design space exploration and collaborative performance modeling. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 13(3), 1–28 (2020)

    Article  Google Scholar 

  7. Koo, Y., Kim, S., Ha, Y.-G.: OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework. World Wide Web 24(4), 1299–1319 (2020). https://doi.org/10.1007/s11280-020-00778-y

    Article  Google Scholar 

  8. Dagli, R., Eken, S.: Deploying a smart queuing system on edge with Intel OpenVINO toolkit. Soft. Comput. 25(15), 10103–10115 (2021). https://doi.org/10.1007/s00500-021-05891-2

    Article  Google Scholar 

  9. Marco, V.S.: Optimizing deep learning inference on embedded systems through adaptive model selection. ACM Trans. Embed. Comput. Syst. 19(1), 1–28 (2020)

    Article  Google Scholar 

  10. Dua A.: Systolic-CNN: an OpenCL-defined scalable run-time-flexible FPGA accelerator architecture for accelerating convolutional neural network inference in cloud/edge computing. In: Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), p. 231. IEEE, Piscataway, NJ (2020)

    Google Scholar 

  11. Lin, D.L.: Accelerating large sparse neural network inference using GPU task graph parallelism. IEEE Trans. Parallel Distrib. Syst. 33(11), 3041–3052 (2021)

    Google Scholar 

  12. He, S.: An efficient GPU-accelerated inference engine for binary neural network on mobile phones. J. Syst. Architect. 117, 102156 (2021)

    Article  Google Scholar 

  13. Chen, J.: Split convolutional neural networks for distributed inference on concurrent IoT sensors. In: International Conference on Parallel and Distributed Systems (ICPADS), pp. 66–73. IEEE, Piscataway, NJ (2021)

    Google Scholar 

  14. Howard A.G.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  15. He, K.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, Piscataway, NJ (2016)

    Google Scholar 

  16. Qin Z.: Diagonal wise refactorization: an efficient training method for depthwise convolutions. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 770–778. IEEE, Piscataway, NJ (2016)

    Google Scholar 

Download references

Funding

This work is funded in part by the Key Research and Development Program of Shaanxi (Program No. 2022ZDLGY01-09), GHfund A (No. 202107014474) GHfund C (No. 202202036165), Wuhu and Xidian University special fund for industry-university-research cooperation (Project No. XWYCXY-012021013), and Cloud Computing Key Laboratory of Gansu Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huming Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Y., Zhu, H., Zhang, L., Hou, B., Jiao, L. (2022). Accelerating Deep Convolutional Neural Network Inference Based on OpenCL. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14903-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14902-3

  • Online ISBN: 978-3-031-14903-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics