Abstract
YOLOv2 is an object detection algorithm grounded on the Darknet neural network, widely applied in the advanced driver assistance system. Nevertheless, the YOLOv2 algorithm must be accelerated on a high-performance computing platform before being put into practical usage. Various computing platforms have their specific features. The merits or drawbacks of the accelerated platform are hard for the developers to recognize and pick up the right alternative based on real demands. This paper analyzes the pros and cons of embedded GPU and FPGA for improving the YOLOv2 algorithm concerning development speed, power efficiency, and computing performance. The analysis provides the developers with insights into choosing the hardware to optimize the YOLOv2 algorithm. According to the experimental data, it is found that if FPGA is optimized profoundly, the performance of power efficiency, as well as speed, will exceed embedded GPU. However, the FPGA development procedure is tough and demands much more time for developers than the GPU development process. Finally, we propose a balanced method to take advantage of GPU’s development speed and FGPA’s high performance.
Similar content being viewed by others
Data availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
References
Deepstream for video analytics. https://github.com/vat-nvidia/deepstream-plugins. Accessed 30 Dec 2012
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jablin TB, Prabhu P, Jablin JA, Johnson NP, Beard SR, August DI (2011) Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp 142–151
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678
Lei J, Dl Li, Yl Zhou, Liu W (2019) Optimization and acceleration of flow simulations for cfd on cpu/gpu architecture. J Braz Soc Mech Sci Eng 41(7):290
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37
Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 31–40
Naphade M, Anastasiu DC, Sharma A, Jagrlamudi V, Jeon H, Liu K, Chang MC, Lyu S, Gao Z (2017) The nvidia ai city challenge. In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1–6
Polypublie: An institutional repository of polytechnique montréal. https://publications.polymtl.ca/. Accessed 30 Dec 2012
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Shan L, Zhang M, Deng L, Gong G (2016) A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Computer Engineering and Technology . Springer, pp 102–111
Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 535–547. IEEE
Venieris SI, Bouganis CS (2016) fpgaconvnet: a framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual international symposium on field-programmable custom computing machines (FCCM), pp. 40–47. IEEE
Wai YJ, bin Mohd Yussof Z, bin Salim SI, Chuan LK (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 9(10):506–512
Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: Proceedings of the 54th annual design automation conference 2017, pp 1–6
Wu J, Guo S, Li J, Zeng D (2016) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887
Wu J, Guo S, Huang H, Liu W, Xiang Y (2018) Information and communications technologies for sustainable development goals: state-of-the-art, needs and perspectives. IEEE Commun Surv Tutorials 20(3):2389–2406
Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aided Des Integr Circuits Syst 38(11):2072–2085
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, C. YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method. Evol. Intel. 15, 2581–2587 (2022). https://doi.org/10.1007/s12065-021-00612-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-021-00612-y