Abstract
The Field Programmable Gate Array (FPGA) accelerator for CNN-based object detection has been attracting widespread attention in computer vision. For most existing FPGA accelerators, the inference accuracy and speed are affected negatively by the low power-efficient and performance-density. To address this problem, we propose a software and hardware co-designed FPGA accelerator for accurate and fast object detection with high power-efficient and performance-density. To develop the FPGA accelerator on CPU+FPGA heterogeneous platforms, a resource sensitive and energy aware FPGA accelerator framework is designed. In hardware, a hardware sensitive neural network quantization called Dynamic Fixed-point Data Quantization (DFDQ) is proposed to improve the power-efficient. In software, an algorithm-level convolution (CONV) optimization scheme is further proposed to improve the performance-density by paralleling block execution of CONV cores. To validate the proposed FPGA accelerator, a Zynq FPGA is used to build the acceleration platform of You Only Look Once (YOLO) network. Results demonstrate that the proposed FPGA accelerator outperforms the state-of-the-art methods in power-efficient and performance-density. Besides, the speed of object detection is increased by at most 16.5 times along with less than 1.5% accuracy degradation.
This work was supported by the Science and Technology Major Program of Anhui Province of China under Grants 202003a05020020, Joint fund of Science & Technology Department of Liaoning Province and State Key Laboratory of Robotics, China under Grant 2020-KF-22-16, Special Foundation of President of the Hefei Institutes of Physical Science under Grant YZJJ2020QN36, Anhui Provincial Key R&D Program under Grant 202104a05020043, the University Synergy Innovation Program of Anhui Province under Grant GXXT-2019-003.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guan, W., Wang, T., Qi, J., Zhang, L., Lu, H.: Edge-aware convolution neural network based salient object detection. IEEE Signal Process. Lett. 26(1), 114–118 (2018)
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE Trans. Pattern Anal. Mach. Intell. 29(2), 300–312 (2007)
Fang, H., et al.: From captions to visual concepts and back. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Borji, A., Frintrop, S., Sihite, D.N., Itti, L.: Adaptive object tracking by learning background context. In: Computer Vision & Pattern Recognition Workshops (2012)
Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)
Xu, K., Wang, X., Liu, X., Cao, C., Wang, D.: A dedicated hardware accelerator for real-time acceleration of yolov2. J. Real-Time Image Process. 18(1), 481–492 (2021)
Wang, Z., Xu, K., Wu, S., Liu, L., Wang, D.: Sparse-yolo: Hardware/software co-design of an fpga accelerator for yolov2. IEEE Access PP(99), 1–1 (2020)
Bourrasset, C., Maggiani, L., Sérot, J., Berry, F.: Dataflow object detection system for fpga-based smart camera. Circuits Dev. Syst. Iet 10(4), 280–291 (2016)
Kyrkou, C., Theocharides, T.: A parallel hardware architecture for real-time object detection with support vector machines. IEEE Trans. Comput. 61(6), 831–842 (2012)
Ma, X., Najjar, W., Roy-Chowdhury, A.: High-throughput fixed-point object detection on fpgas. In: 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 107–107 (2014). https://doi.org/10.1109/FCCM.2014.40
Ma, X., Najjar, W.A., Roy-Chowdhury, A.K.: Evaluation and acceleration of high-throughput fixed-point object detection on fpgas. IEEE Trans. Circuits Syst. Video Technol. 25(6), 1051–1062 (2015)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing fpga-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2015, pp. 161–170. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2684746.2689060
Sharma, H., Park, J., Mahajan, D., Amaro, E., Esmaeilzadeh, H.: From high-level deep neural models to fpgas. In: IEEE/ACM International Symposium on Microarchitecture (2016)
Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer cnn accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783725
Zhu, C., Huang, K., Yang, S., Zhu, Z., Zhang, H., Shen, H.: An efficient hardware accelerator for structured sparse convolutional neural networks on fpgas (2020)
Moini, S., Alizadeh, B., Emad, M., Ebrahimpour, R.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE Trans. Circuits Syst. I Express Briefs 64, 1217–1221 (2017)
Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight yolov2: a binarized cnn with a parallel support vector regression for an fpga. In: The 2018 ACM/SIGDA International Symposium (2018)
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient fpga implementation of yolo CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27, 1–13 (2019)
Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., Liang, Y.: Req-yolo: a resource-aware, efficient quantization framework for object detection on fpgas. In: the 2019 ACM/SIGDA International Symposium (2019)
Wai, Y.J., Yussof, Z., Salim, S., Chuan, L.K.: Fixed point implementation of tiny-yolo-v2 using opencl on fpga. Int. J. Adv. Comput. Sci. Appl. 9(10) (2018)
Shen, Y., Ferdman, M., Milder, P.: Maximizing cnn accelerator efficiency through resource partitioning. Comput. Architecture News 45(2), 535–547 (2017)
Li, H., Fan, X., Li, J., Wei, C., Wang, L.: A high performance fpga-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (2016)
Fan, S., Chao, W., Lei, G., Xu, C., Zhou, X.: A high-performance accelerator for large-scale convolutional neural networks. In: 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) (2017)
Nguyen, D.T., Kim, H., Lee, H.J., Chang, I.J.: An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS) (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Qiu, J., et al.: Going deeper with embedded fpga platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2016, pp. 26–35. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2847263.2847265
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Lei, S., Zhang, M., Lin, D., Gong, G.: A dynamic multi-precision fixed-point data quantization strategy for convolutional neural network. In: CCF National Conference on Compujter Engineering and Technology (2016)
Li, S., Luo, Y., Sun, K., Yadav, N., Choi, K.: A novel fpga accelerator design for real-time and ultra-low power deep convolutional neural networks compared with titan x gpu. IEEE Access PP(99), 1 (2020)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. ACM (2014)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Zitnick, C.L.: Microsoft coco: Common objects in context. Springer International Publishing (2014)
Gokhale, V., Jin, J., Dundar, A., Martini, B., Culurciello, E.: A 240 g-ops/s mobile coprocessor for deep neural networks. In: IEEE (2014)
Dong, W., Ke, X., Jiang, D.: Pipecnn: An opencl-based open-source fpga accelerator for convolution neural networks. In: 2017 International Conference on Field Programmable Technology (ICFPT) (2017)
Zhao, R., Niu, X., Wu, Y., Luk, W., Qiang, L.: Optimizing cnn-based object detection algorithms on embedded fpga platforms. In: International Symposium on Applied Reconfigurable Computing (2017)
Bao, C., Xie, T., Feng, W., Chang, L., Yu, C.: A power-efficient optimizing framework fpga accelerator based on winograd for yolo. IEEE Access PP(99), 1 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, G. et al. (2021). High Power-Efficient and Performance-Density FPGA Accelerator for CNN-Based Object Detection. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13019. Springer, Cham. https://doi.org/10.1007/978-3-030-88004-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-88004-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88003-3
Online ISBN: 978-3-030-88004-0
eBook Packages: Computer ScienceComputer Science (R0)