OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Koo, Yongbon; Kim, Sunghoon; Ha, Young-guk

doi:10.1007/s11280-020-00778-y

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Published: 07 February 2020

Volume 24, pages 1299–1319, (2021)
Cite this article

World Wide Web Aims and scope Submit manuscript

1202 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Object detection is a technology that deals with recognizing classes of objects and their location. It is used in many different areas, such as in face-detecting systems [16, 34, 37], surveillance tools [9], human-machine interfaces [17], and self-driving cars [18, 23, 25, 26, 30]. These days, deep learning object detection approaches have achieved significantly better performance than the classical feature-based algorithms. Darknet [31] is a deep learning object detection framework, which is well known for its fast speed and simple structure. Unfortunately, Darknet can only work with Nvidia CUDA [6] for accelerating its deep learning calculations. For this reason, users have only limited options of selecting appropriate graphic cards. Open computing language (OpenCL) [35], an open standard for cross-platform, parallel programming of heterogeneous systems, is available for the general hardware accelerators. However, many deep learning frameworks including Darknet have no support for OpenCL.

In our previous paper, we presented OpenCL-Darknet [19], which transformed the CUDA-based Darknet into an open standard OpenCL backend. The original OpenCL-Darknet successfully showed its ability for the general graphics processing unit (GPU) hardware. However, it could not achieve competitive performance compared with the CUDA version, and it only supported a limited platform. In this study, we improved the performance of OpenCL-Darknet with several optimization techniques and added support for various architectures. We also evaluated OpenCL-Darknet not only in AMD R7 accelerated processing unit (APU) with OpenCL 2.0, but also in Nvidia GPU and ARM Mali embedded GPU with OpenCL 1.2 Profile. The evaluation using the standard object detection datasets showed that our advanced OpenCL-Darknet reduced the processing time by at most 50% on average for various deep learning object detection networks compared with our original implementation. We also showed that our OpenCL deep learning framework has competitiveness compared with the CUDA-based one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

Tausif Diwan, G. Anirudh & Jitendra V. Tembhurne

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Ajantha Vijayakumar & Subramaniyaswamy Vairavasundaram

References

Badía, J., Belloch, J., Cobos, M., Igual, F., Quintana-Ortí, E.: Accelerating the SRP-PHAT algorithm on multi and many-core platforms using OpenCL. J. Supercomput. 75(3), 1284–1297 (2019)
Article Google Scholar
D. Barry, M. Shah, M. Keijsers, H. Khan, and B. Hopman, “xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware,” arXiv preprint, 2019
Beck, K.: Test Driven Development: by Example. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (2002)
Google Scholar
clBLAS, Advanced Micro Devices, Inc., Phoenix (n.d.), [Online]. Available: https://github.com/clMathLibraries/clBLAS
clRNG, Advanced Micro Devices, Inc., Phoenix (n.d.), [Online]. Available: https://github.com/clMathLibraries/clRNG
Cook, S.: CUDA Programming: a Developer's Guide to Parallel Computing with GPUs. Morgan Kaufmann Publishers Inc., San Francisco (2013)
Google Scholar
cuBLAS, Nvidia Corporation, Santa Clara (n.d.), [Online]. Available: https://developer.nvidia.com/cublas
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, 2005, pp. 886–893
N. Dalal, B. Triggs, and C. Schmid, “Human Detection Using Oriented Histograms of Flow and Appearance,” in Computer Vision (ECCV 2006), Springer Berlin Heidelberg, 2006, pp. 428–441
M. Everingham, L. Van-Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results,” [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite,” in Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
J. Gu, Y. Liu, Y. Gao, and M. Zhu, “OpenCL Caffe: Accelerating and Enabling a Cross Platform Machine Learning Framework,” in Proc. The 4th International Workshop on OpenCL, New York, 2016, pp 8:1–8:5
H. Haseljic, E. Cogo, I. Prazina, R. Turcinhodzic, E. Buza, and A. Akagic, “OpenCL Superpixel Implementation on a General Purpose Multi-core CPU,” in Proc. of 2018 IEEE International Conference on Imaging Systems and Techniques (IST), Krakow, Poland, 2018
Hendry, Chern, R.: Automatic License Plate Recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019)
Article Google Scholar
Ji, Y., Kim, S., Kim, Y., Lee, K.: Human-like sign-language learning method using deep learning. ETRI J. 40, 435–445 (2018)
Article Google Scholar
Kim, J., Ryu, J.H., Han, T.M.: Multimodal Interface based on novel HMI UI/UX for in-vehicle infotainment system. ETRI J. 37(4), 793–803 (2015)
Article Google Scholar
Y. Koo, J. Kim, and W. Han, “A method for driving control authority transition for cooperative autonomous vehicle,” in Proc. 2015 IEEE Intelligent Vehicles Symposium, Seoul, 2015, pp. 394–399
Y. Koo, C. You, and S. Kim, “OpenCL-Darknet: An OpenCL Implementation for Object Detection,” in Proc. The 1st International Workshop on Driving Computing Platform for Autonomous Vehicles, Shanghai, 2018
W. Lee, and W. Loh, “G-OPTICS: fast ordering density-based cluster objects using graphics processing units,” in Int. J. Web Grid Serv., vol. 14(3), 2018
L. Liao, K. Li, K. Li, C. Yang, and Q. Tian, “UHCL-Darknet: An OpenCL-based Deep Neural Network Framework for Heterogeneous Multi−/Many-core Clusters,” in Proc. of the 47th International Conference on Parallel Processing, Eugene, OR, USA, 2018
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in European conference on computer vision, pp. 740–755, 2014
Montemerlo, M., Becker, J., Bhat, S., Dahlkamp, H., Dolgov, D., Ettinger, S., Haehnel, D., Hilden, T., Hoffmann, G., Huhnke, B., Johnston, D., Klumpp, S., Langer, D., Levandowski, A., Levinson, J., Marcil, J., Orenstein, D., Paefgen, J., Penny, I., Petrovskaya, A., Pflueger, M., Stanek, G., Stavens, D., Vogt, A., Thrun, S.: Junior: the Stanford entry in the urban challenge. J. Field Rob. 25(9), 569–597 (2008)
Article Google Scholar
A. Neubeck and L. Van Gool, “Efficient Non-Maximum Suppression,” in Proc. The 18th International Conference on Pattern Recognition, Washington, 2006, pp. 850–855
Noh, S., An, K.: Decision-making framework for automated driving in highway environments. IEEE Trans. Intell. Transp. Syst. 19(1), 58–71 (2018)
Article Google Scholar
Noh, S., Park, B., An, K., Koo, Y., Han, W.: Co-pilot agent for vehicle/driver cooperative and autonomous driving. ETRI J. 37(5), 1032–1043 (2015)
Article Google Scholar
C. Nugteren, “CLBlast: A Tuned OpenCL BLAS Library,” arXiv preprint, 2017
C. Nugteren, “CLTune: A Generic Auto-Tuner for OpenCL Kernels,” arXiv preprint, 2017
C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” in Proc. 6th International Conference on Computer Vision, Bombay, 1998, pp. 555–562
Park, M., Lee, S., Han, W.: Development of steering control system for autonomous vehicle using geometry-based path tracking algorithm. ETRI J. 37(3), 617–625 (2015)
Article Google Scholar
J. Redmon, “Darknet: Open Source Neural Networks in C (n.d.),” [Online]. Available: http://pjreddie.com/darknet
J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger,” arXiv preprint, 2016
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” in Proc. Advances in Neural Information Processing Systems, Montréal, 2015, pp. 91–99
Rowley, H.A., Baluja, S., Kanade, T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20, 23–38 (1998)
Article Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)
Article Google Scholar
B. Su and K. Keutzer, “clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs,” in Proc. The 26th ACM International Conference on Supercomputing, New York, 2012, pp 353–364
Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)
Article Google Scholar

Download references

Funding

This work was supported by the Technology Innovation Program (20000946, Development of artificial intelligent computing platform technology for service robots capable of real-time processing of large-capacity, high-performance sensor fusion processing and deep learning) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Author information

Authors and Affiliations

Electronics and Telecommunications Research Institute, 218 Gajeong-ro, Yuseong-gu, Daejeon, Republic of Korea
Yongbon Koo & Sunghoon Kim
Konkuk University, 120 Neungdong-ro, Gwangjin-gu, Seoul, Republic of Korea
Young-guk Ha

Authors

Yongbon Koo
View author publications
You can also search for this author in PubMed Google Scholar
Sunghoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young-guk Ha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-guk Ha.

Additional information

This article belongs to the Topical Collection: Special Issue on Artificial Intelligence and Big Data Computing

Guest Editors: Wookey Lee and Hiroyuki Kitagawa

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koo, Y., Kim, S. & Ha, Yg. OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework. World Wide Web 24, 1299–1319 (2021). https://doi.org/10.1007/s11280-020-00778-y

Download citation

Received: 11 May 2019
Revised: 19 October 2019
Accepted: 02 January 2020
Published: 07 February 2020
Issue Date: July 2021
DOI: https://doi.org/10.1007/s11280-020-00778-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

OpenCL-Darknet: implementation and optimization of OpenCL-based deep learning object detection framework

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

Object detection using YOLO: challenges, architectural successors, datasets and applications

YOLO-based Object Detection Models: A Review and its Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation