Skip to main content

High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

  • 2603 Accesses

Abstract

The Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.

This work was supported by the Science and Technology Major Program of Anhui Province of China under Grants 202003a05020020, Joint fund of Science & Technology Department of Liaoning Province and State Key Laboratory of Robotics, China under Grant 2020-KF-22-16, Special Foundation of President of the Hefei Institutes of Physical Science under Grant YZJJ2020QN36, Anhui Provincial Key R&D Program under Grant 202104a05020043, the University Synergy Innovation Program of Anhui Province under Grant GXXT-2019-003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)

    Article  Google Scholar 

  2. Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 300–312 (2007)

    Article  Google Scholar 

  3. Fang, H., et al.: From captions to visual concepts and back. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  4. Borji, A., Frintrop, S., Sihite, D.N., Itti, L.: Adaptive object tracking by learning background context. In: Computer Vision & Pattern Recognition Workshops (2012)

    Google Scholar 

  5. Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)

    Article  Google Scholar 

  6. Xu, K., Wang, X., Liu, X., Cao, C., Wang, D.: A dedicated hardware accelerator for real-time acceleration of yolov2. J. Real-Time Image Process. 18(1), 481–492 (2021)

    Google Scholar 

  7. Wang, Z., Xu, K., Wu, S., Liu, L., Wang, D.: Sparse-yolo: Hardware/software co-design of an fpga accelerator for yolov2. IEEE Access PP(99), 1–1 (2020)

    Google Scholar 

  8. Bourrasset, C., Maggiani, L., Sérot, J., Berry, F.: Dataflow object detection system for fpga-based smart camera. Circuits Dev. Syst. Iet 10(4), 280–291 (2016)

    Article  Google Scholar 

  9. Kyrkou, C., Theocharides, T.: A parallel hardware architecture for real-time object detection with support vector machines. IEEE Trans. Comput. 61(6), 831–842 (2012)

    Article  MathSciNet  Google Scholar 

  10. Ma, X., Najjar, W., Roy-Chowdhury, A.: High-throughput fixed-point object detection on fpgas. In: 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 107–107 (2014). https://doi.org/10.1109/FCCM.2014.40

  11. Ma, X., Najjar, W.A., Roy-Chowdhury, A.K.: Evaluation and acceleration of high-throughput fixed-point object detection on fpgas. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1051–1062 (2015)

    Article  Google Scholar 

  12. Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2015, pp. 161–170. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2684746.2689060

  13. Sharma, H., Park, J., Mahajan, D., Amaro, E., Esmaeilzadeh, H.: From high-level deep neural models to fpgas. In: IEEE/ACM International Symposium on Microarchitecture (2016)

    Google Scholar 

  14. Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer cnn accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783725

  15. Zhu, C., Huang, K., Yang, S., Zhu, Z., Zhang, H., Shen, H.: An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas (2020)

    Google Scholar 

  16. Moini, S., Alizadeh, B., Emad, M., Ebrahimpour, R.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans. Circuits Syst. I Express Briefs 64, 1217–1221 (2017)

    Google Scholar 

  17. Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: The 2018 ACM/SIGDA International Symposium (2018)

    Google Scholar 

  18. Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1–13 (2019)

    Google Scholar 

  19. Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: a resource-aware, efficient quantization framework for object detection on fpgas. In: the 2019 ACM/SIGDA International Symposium (2019)

    Google Scholar 

  20. Wai, Y.J., Yussof, Z., Salim, S., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10) (2018)

    Google Scholar 

  21. Shen, Y., Ferdman, M., Milder, P.: Maximizing cnn accelerator efficiency through resource partitioning. Comput. Architecture News 45(2), 535–547 (2017)

    Article  Google Scholar 

  22. Li, H., Fan, X., Li, J., Wei, C., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (2016)

    Google Scholar 

  23. Fan, S., Chao, W., Lei, G., Xu, C., Zhou, X.: A high-performance accelerator for large-scale convolutional neural networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) (2017)

    Google Scholar 

  24. Nguyen, D.T., Kim, H., Lee, H.J., Chang, I.J.: An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS) (2018)

    Google Scholar 

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)

    Google Scholar 

  26. Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2016, pp. 26–35. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2847263.2847265

  27. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  28. Lei, S., Zhang, M., Lin, D., Gong, G.: A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Compujter Engineering and Technology (2016)

    Google Scholar 

  29. Li, S., Luo, Y., Sun, K., Yadav, N., Choi, K.: A novel fpga accelerator design for real-time and ultra-low power deep convolutional neural networks compared with titan x gpu. IEEE Access PP(99), 1 (2020)

    Google Scholar 

  30. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. ACM (2014)

    Google Scholar 

  31. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Zitnick, C.L.: Microsoft coco: Common objects in context. Springer International Publishing (2014)

    Google Scholar 

  32. Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 g-ops/s mobile coprocessor for deep neural networks. In: IEEE (2014)

    Google Scholar 

  33. Dong, W., Ke, X., Jiang, D.: Pipecnn: An opencl-based open-source fpga accelerator for convolution neural networks. In: 2017 International Conference on Field Programmable Technology (ICFPT) (2017)

    Google Scholar 

  34. Zhao, R., Niu, X., Wu, Y., Luk, W., Qiang, L.: Optimizing cnn-based object detection algorithms on embedded fpga platforms. In: International Symposium on Applied Reconfigurable Computing (2017)

    Google Scholar 

  35. Bao, C., Xie, T., Feng, W., Chang, L., Yu, C.: A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access PP(99), 1 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chaofan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, G. et al. (2021). High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88004-0_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88003-3

  • Online ISBN: 978-3-030-88004-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics