YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

Liu, Congjun

doi:10.1007/s12065-021-00612-y

YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

Special Issue
Published: 26 June 2021

Volume 15, pages 2581–2587, (2022)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Congjun Liu¹

433 Accesses
7 Citations
Explore all metrics

Abstract

YOLOv2 is an object detection algorithm grounded on the Darknet neural network, widely applied in the advanced driver assistance system. Nevertheless, the YOLOv2 algorithm must be accelerated on a high-performance computing platform before being put into practical usage. Various computing platforms have their specific features. The merits or drawbacks of the accelerated platform are hard for the developers to recognize and pick up the right alternative based on real demands. This paper analyzes the pros and cons of embedded GPU and FPGA for improving the YOLOv2 algorithm concerning development speed, power efficiency, and computing performance. The analysis provides the developers with insights into choosing the hardware to optimize the YOLOv2 algorithm. According to the experimental data, it is found that if FPGA is optimized profoundly, the performance of power efficiency, as well as speed, will exceed embedded GPU. However, the FPGA development procedure is tough and demands much more time for developers than the GPU development process. Finally, we propose a balanced method to take advantage of GPU’s development speed and FGPA’s high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Embedded Vision Systems: A Review of the Literature

Development of Fast Refinement Detectors on AI Edge Platforms

Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs

Article 14 March 2023

Data availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

Deepstream for video analytics. https://github.com/vat-nvidia/deepstream-plugins. Accessed 30 Dec 2012
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Jablin TB, Prabhu P, Jablin JA, Johnson NP, Beard SR, August DI (2011) Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp 142–151
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678
Lei J, Dl Li, Yl Zhou, Liu W (2019) Optimization and acceleration of flow simulations for cfd on cpu/gpu architecture. J Braz Soc Mech Sci Eng 41(7):290
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37
Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 31–40
Naphade M, Anastasiu DC, Sharma A, Jagrlamudi V, Jeon H, Liu K, Chang MC, Lyu S, Gao Z (2017) The nvidia ai city challenge. In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1–6
Polypublie: An institutional repository of polytechnique montréal. https://publications.polymtl.ca/. Accessed 30 Dec 2012
Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Shan L, Zhang M, Deng L, Gong G (2016) A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Computer Engineering and Technology . Springer, pp 102–111
Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 535–547. IEEE
Venieris SI, Bouganis CS (2016) fpgaconvnet: a framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual international symposium on field-programmable custom computing machines (FCCM), pp. 40–47. IEEE
Wai YJ, bin Mohd Yussof Z, bin Salim SI, Chuan LK (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 9(10):506–512
Google Scholar
Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: Proceedings of the 54th annual design automation conference 2017, pp 1–6
Wu J, Guo S, Li J, Zeng D (2016) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887
Article Google Scholar
Wu J, Guo S, Huang H, Liu W, Xiang Y (2018) Information and communications technologies for sustainable development goals: state-of-the-art, needs and perspectives. IEEE Commun Surv Tutorials 20(3):2389–2406
Article Google Scholar
Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aided Des Integr Circuits Syst 38(11):2072–2085
Article Google Scholar
Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170

Download references

Author information

Authors and Affiliations

School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, Jiangsu, China
Congjun Liu

Authors

Congjun Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Congjun Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C. YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method. Evol. Intel. 15, 2581–2587 (2022). https://doi.org/10.1007/s12065-021-00612-y

Download citation

Received: 09 August 2020
Revised: 05 March 2021
Accepted: 22 April 2021
Published: 26 June 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s12065-021-00612-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

Abstract

Access this article

Similar content being viewed by others

Embedded Vision Systems: A Review of the Literature

Development of Fast Refinement Detectors on AI Edge Platforms

Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

Abstract

Access this article

Similar content being viewed by others

Embedded Vision Systems: A Review of the Literature

Development of Fast Refinement Detectors on AI Edge Platforms

Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation