Skip to main content

Advertisement

Log in

YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

YOLOv2 is an object detection algorithm grounded on the Darknet neural network, widely applied in the advanced driver assistance system. Nevertheless, the YOLOv2 algorithm must be accelerated on a high-performance computing platform before being put into practical usage. Various computing platforms have their specific features. The merits or drawbacks of the accelerated platform are hard for the developers to recognize and pick up the right alternative based on real demands. This paper analyzes the pros and cons of embedded GPU and FPGA for improving the YOLOv2 algorithm concerning development speed, power efficiency, and computing performance. The analysis provides the developers with insights into choosing the hardware to optimize the YOLOv2 algorithm. According to the experimental data, it is found that if FPGA is optimized profoundly, the performance of power efficiency, as well as speed, will exceed embedded GPU. However, the FPGA development procedure is tough and demands much more time for developers than the GPU development process. Finally, we propose a balanced method to take advantage of GPU’s development speed and FGPA’s high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

References

  1. Deepstream for video analytics. https://github.com/vat-nvidia/deepstream-plugins. Accessed 30 Dec 2012

  2. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  3. Jablin TB, Prabhu P, Jablin JA, Johnson NP, Beard SR, August DI (2011) Automatic cpu-gpu communication management and optimization. In: Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, pp 142–151

  4. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678

  5. Lei J, Dl Li, Yl Zhou, Liu W (2019) Optimization and acceleration of flow simulations for cfd on cpu/gpu architecture. J Braz Soc Mech Sci Eng 41(7):290

    Article  Google Scholar 

  6. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin, pp 740–755

  7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Berlin, pp 21–37

  8. Nakahara H, Yonekawa H, Fujii T, Sato S (2018) A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: Proceedings of the 2018 ACM/SIGDA international symposium on field-programmable gate arrays, pp 31–40

  9. Naphade M, Anastasiu DC, Sharma A, Jagrlamudi V, Jeon H, Liu K, Chang MC, Lyu S, Gao Z (2017) The nvidia ai city challenge. In: 2017 IEEE smartworld, ubiquitous intelligence & computing, advanced & trusted computed, scalable computing & communications, cloud & big data computing, internet of people and smart city innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, pp 1–6

  10. Polypublie: An institutional repository of polytechnique montréal. https://publications.polymtl.ca/. Accessed 30 Dec 2012

  11. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays, pp 26–35

  12. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  13. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  14. Shan L, Zhang M, Deng L, Gong G (2016) A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Computer Engineering and Technology . Springer, pp 102–111

  15. Shen Y, Ferdman M, Milder P (2017) Maximizing cnn accelerator efficiency through resource partitioning. In: 2017 ACM/IEEE 44th annual international symposium on computer architecture (ISCA), pp 535–547. IEEE

  16. Venieris SI, Bouganis CS (2016) fpgaconvnet: a framework for mapping convolutional neural networks on fpgas. In: 2016 IEEE 24th Annual international symposium on field-programmable custom computing machines (FCCM), pp. 40–47. IEEE

  17. Wai YJ, bin Mohd Yussof Z, bin Salim SI, Chuan LK (2018) Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int J Adv Comput Sci Appl 9(10):506–512

    Google Scholar 

  18. Wei X, Yu CH, Zhang P, Chen Y, Wang Y, Hu H, Liang Y, Cong J (2017) Automated systolic array architecture synthesis for high throughput cnn inference on fpgas. In: Proceedings of the 54th annual design automation conference 2017, pp 1–6

  19. Wu J, Guo S, Li J, Zeng D (2016) Big data meet green challenges: greening big data. IEEE Syst J 10(3):873–887

    Article  Google Scholar 

  20. Wu J, Guo S, Huang H, Liu W, Xiang Y (2018) Information and communications technologies for sustainable development goals: state-of-the-art, needs and perspectives. IEEE Commun Surv Tutorials 20(3):2389–2406

    Article  Google Scholar 

  21. Zhang C, Sun G, Fang Z, Zhou P, Pan P, Cong J (2018) Caffeine: toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans Comput-Aided Des Integr Circuits Syst 38(11):2072–2085

    Article  Google Scholar 

  22. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, pp 161–170

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Congjun Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C. YOLOv2 acceleration using embedded GPU and FPGAs: pros, cons, and a hybrid method. Evol. Intel. 15, 2581–2587 (2022). https://doi.org/10.1007/s12065-021-00612-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-021-00612-y

Keywords

Navigation