High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection

Zhang, Gang; Zhang, Chaofan; Wang, Fan; Tang, Fulin; Wu, Yihong; Yang, Xuezhi; Liu, Yong

doi:10.1007/978-3-030-88004-0_10

Gang Zhang¹⁶,
Chaofan Zhang¹⁷,
Fan Wang^17,18,
Fulin Tang¹⁹,
Yihong Wu¹⁹,
Xuezhi Yang¹⁶ &
…
Yong Liu¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13019))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

2603 Accesses

Abstract

The Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.

This work was supported by the Science and Technology Major Program of Anhui Province of China under Grants 202003a05020020, Joint fund of Science & Technology Department of Liaoning Province and State Key Laboratory of Robotics, China under Grant 2020-KF-22-16, Special Foundation of President of the Hefei Institutes of Physical Science under Grant YZJJ2020QN36, Anhui Provincial Key R&D Program under Grant 202104a05020043, the University Synergy Innovation Program of Anhui Province under Grant GXXT-2019-003.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)
Article Google Scholar
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 300–312 (2007)
Article Google Scholar
Fang, H., et al.: From captions to visual concepts and back. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Borji, A., Frintrop, S., Sihite, D.N., Itti, L.: Adaptive object tracking by learning background context. In: Computer Vision & Pattern Recognition Workshops (2012)
Google Scholar
Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)
Article Google Scholar
Xu, K., Wang, X., Liu, X., Cao, C., Wang, D.: A dedicated hardware accelerator for real-time acceleration of yolov2. J. Real-Time Image Process. 18(1), 481–492 (2021)
Google Scholar
Wang, Z., Xu, K., Wu, S., Liu, L., Wang, D.: Sparse-yolo: Hardware/software co-design of an fpga accelerator for yolov2. IEEE Access PP(99), 1–1 (2020)
Google Scholar
Bourrasset, C., Maggiani, L., Sérot, J., Berry, F.: Dataflow object detection system for fpga-based smart camera. Circuits Dev. Syst. Iet 10(4), 280–291 (2016)
Article Google Scholar
Kyrkou, C., Theocharides, T.: A parallel hardware architecture for real-time object detection with support vector machines. IEEE Trans. Comput. 61(6), 831–842 (2012)
Article MathSciNet Google Scholar
Ma, X., Najjar, W., Roy-Chowdhury, A.: High-throughput fixed-point object detection on fpgas. In: 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 107–107 (2014). https://doi.org/10.1109/FCCM.2014.40
Ma, X., Najjar, W.A., Roy-Chowdhury, A.K.: Evaluation and acceleration of high-throughput fixed-point object detection on fpgas. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1051–1062 (2015)
Article Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2015, pp. 161–170. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2684746.2689060
Sharma, H., Park, J., Mahajan, D., Amaro, E., Esmaeilzadeh, H.: From high-level deep neural models to fpgas. In: IEEE/ACM International Symposium on Microarchitecture (2016)
Google Scholar
Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer cnn accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783725
Zhu, C., Huang, K., Yang, S., Zhu, Z., Zhang, H., Shen, H.: An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas (2020)
Google Scholar
Moini, S., Alizadeh, B., Emad, M., Ebrahimpour, R.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans. Circuits Syst. I Express Briefs 64, 1217–1221 (2017)
Google Scholar
Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: The 2018 ACM/SIGDA International Symposium (2018)
Google Scholar
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1–13 (2019)
Google Scholar
Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: a resource-aware, efficient quantization framework for object detection on fpgas. In: the 2019 ACM/SIGDA International Symposium (2019)
Google Scholar
Wai, Y.J., Yussof, Z., Salim, S., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10) (2018)
Google Scholar
Shen, Y., Ferdman, M., Milder, P.: Maximizing cnn accelerator efficiency through resource partitioning. Comput. Architecture News 45(2), 535–547 (2017)
Article Google Scholar
Li, H., Fan, X., Li, J., Wei, C., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (2016)
Google Scholar
Fan, S., Chao, W., Lei, G., Xu, C., Zhou, X.: A high-performance accelerator for large-scale convolutional neural networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) (2017)
Google Scholar
Nguyen, D.T., Kim, H., Lee, H.J., Chang, I.J.: An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS) (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Google Scholar
Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2016, pp. 26–35. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2847263.2847265
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Lei, S., Zhang, M., Lin, D., Gong, G.: A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Compujter Engineering and Technology (2016)
Google Scholar
Li, S., Luo, Y., Sun, K., Yadav, N., Choi, K.: A novel fpga accelerator design for real-time and ultra-low power deep convolutional neural networks compared with titan x gpu. IEEE Access PP(99), 1 (2020)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. ACM (2014)
Google Scholar
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Zitnick, C.L.: Microsoft coco: Common objects in context. Springer International Publishing (2014)
Google Scholar
Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 g-ops/s mobile coprocessor for deep neural networks. In: IEEE (2014)
Google Scholar
Dong, W., Ke, X., Jiang, D.: Pipecnn: An opencl-based open-source fpga accelerator for convolution neural networks. In: 2017 International Conference on Field Programmable Technology (ICFPT) (2017)
Google Scholar
Zhao, R., Niu, X., Wu, Y., Luk, W., Qiang, L.: Optimizing cnn-based object detection algorithms on embedded fpga platforms. In: International Symposium on Applied Reconfigurable Computing (2017)
Google Scholar
Bao, C., Xie, T., Feng, W., Chang, L., Yu, C.: A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access PP(99), 1 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Hefei University of Technology, Hefei, Anhui, China
Gang Zhang & Xuezhi Yang
Anhui Institute of Optics and Fine Mechanics, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, China
Chaofan Zhang, Fan Wang & Yong Liu
University of Science and Technology of China, Hefei, Anhui, China
Fan Wang
The National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Fulin Tang & Yihong Wu

Authors

Gang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chaofan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fulin Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuezhi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaofan Zhang .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, G. et al. (2021). High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-88004-0_10
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics