Abstract
The You Only Look Once (YOLO) algorithm has a good trade-off between accuracy and execution speed in object detection. The main bottleneck of execution speed in YOLO is the optimum implementation of the Convolutional Neural Network (CNN). Reducing convolution core resources to increase parallelism can significantly increase the execution speed of the Algorithm. A new ASIC Processing Element (PE) is presented in this paper to reduce power consumption and increase speed while utilizing fewer resources. A multiplier-less convolution core is proposed by replacing multipliers with multiplexer circuits and designing a 19-input adder. Reducing the weight word length to five bits and compensating for the accuracy with the new quantization, has made the accuracy of the new architecture competitive with previous works. Compared with the traditional convolutional core, the best-proposed core has been improved by 4.44X, 4.9X, and 32% in power consumption, area, and delay, respectively. Placing the proposed core in the PE, the power consumption, FPS, and accuracy were 1.76W, 55.8, and 78%, respectively. Although the proposed 3 × 3 convolution core was evaluated using YOLOv2 and YOLOv4-tiny, it is also applicable to YOLOv7 and YOLOv8.
Similar content being viewed by others
Data availability
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.-J.: A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)
Liu, W., et al.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) European conference on computer vision, pp. 21–37. Springer, Cham (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 28, (2015).
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271 (2017)
Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real Time Image Process. 18(3), 481–492 (2020)
Seong, S., Song, J., Yoon, D., Kim, J., Choi, J.: Determination of vehicle trajectory through optimization of vehicle bounding boxes using a convolutional neural network. Sensors 19(19), 4263 (2019)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, (2020).
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Proc. Comput. Sci. 199, 1066–1073 (2022)
Li, C., et al.: YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, (2022).
Hosseiny, A., Jahanirad, H.: Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J. Real-Time Image Proc. 20(4), 75 (2023)
Talaat, F.M., ZainEldin, H.: An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08809-1
Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)
Wei, X., Liu, W., Chen, L., Ma, L., Chen, H., Zhuang, Y.: FPGA-based hybrid-type implementation of quantized neural networks for remote sensing applications. Sensors 19(4), 924 (2019)
Li, W., et al.: A real-time tree crown detection approach for large-scale remote sensing images on FPGAs. Remote Sensing 11(9), 1025 (2019)
Zhang, N., Wei, X., Chen, H., Liu, W.: FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3), 282 (2021)
Chen, L., Wei, X., Liu, W., Chen, H., Chen, L.: Hardware implementation of convolutional neural network-based remote sensing image classification method. In International Conference in Communications, Signal Processing, and Systems, pp. 140–148: Springer (2018)
Jain, V., Jadhav, N., Verhelst, M.: Enabling real-time object detection on low cost FPGAs. J. Real Time Image Process. 19(1), 217–229 (2022)
Mirsalari, S.A., Nazari, N., Ansarmohammadi, S.A., Salehi, M.E., Ghiasi, S.: E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems. J. Real Time Image Process. 18(4), 1285–1299 (2021)
Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circuits Syst. 1, 76–87 (2020)
Xie, W., Zhang, C., Zhang, Y., Hu, C., Jiang, H., Wang, Z.: An energy-efficient FPGA-based embedded system for CNN application. In 2018 IEEE international conference on electron devices and solid state circuits (EDSSC), pp. 1–2. IEEE (2018)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038. (2021)
Ando, K., Takamaeda-Yamazaki, S., Ikebe, M., Asai, T., Motomura, M.: A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 8(6), 149–170 (2017)
Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y.C., Hsiao, C.-Y., Lee, W.-H.: Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 11(7), 786 (2019)
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2017)
Choi, Y., Bae, D., Sim, J., Choi, S., Kim, M., Kim, L.-S.: Energy-efficient design of processing element for convolutional neural network. IEEE Trans. Circuits Syst. II Express Briefs 64(11), 1332–1336 (2017)
Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)
Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)
Valadanzoj, Z., Daryanavard, H., Harifi, A.: High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J. Supercomput. (2023). https://doi.org/10.1007/s11227-023-05713-2
Shafiei, M., Daryanavard, H., Hatam, A.: Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J. Real-Time Image Proc. 20(5), 94 (2023)
Libano, F., Wilson, B., Wirthlin, M., Rech, P., Brunhaver, J.: Understanding the impact of quantization, accuracy, and radiation on the reliability of convolutional neural networks on FPGAs. IEEE Trans. Nucl. Sci. 67(7), 1478–1484 (2020)
Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens. J. 21(5), 5668–5677 (2020)
Talib, M.A., Majzoub, S., Nasir, Q., Jamal, D.: A systematic literature review on hardware implementation of artificial intelligence algorithms. J. Supercomput. 77(2), 1897–1938 (2021)
Patel, S.K., Singhal, S.K.: Area–delay and energy efficient multi-operand binary tree adder. IET Circuits Devices Syst. 14(5), 586–593 (2020)
Baskin, C., Liss, N., Zheltonozhskii, E., Bronstein, A.M., Mendelson, A.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 162–169. IEEE (2018)
Fathi, A., Ghasemi, M.M., Khoei, A.: Design and performance analysis of an ultra-high-speed 5–2 compressor. Int. J. Circuit Theory Appl. 50(5), 1576–1588 (2022)
Asadi, M.-A., Mosleh, M., Haghparast, M.: Toward novel designs of reversible ternary 6: 2 compressor using efficient reversible ternary full-adders. J. Supercomput. 77(5), 5176–5197 (2021)
Fathi, A., Mashoufi, B., Azizian, S.: Very fast, high-performance 5–2 and 7–2 compressors in CMOS process for rapid parallel accumulations. IEEE Trans. Very Large Scale Integr. Syst. 28(6), 1403–1412 (2020)
Azzaz, M., Tanougast, C., Sadoudi, S., Dandache, A.: Real-time FPGA implementation of Lorenz's chaotic generator for ciphering telecommunications. In 2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference, pp. 1–4. IEEE (2009)
Liu, W., Ma, L., Wang, J.: Detection of multiclass objects in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 16(5), 791–795 (2018)
Author information
Authors and Affiliations
Contributions
H. Daryanvard and S. Bagherzadeh have done the designs and simulations. H. Daryanavard wrote the main manuscript text. Mr. Samti has participated in editing the manuscript. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bagherzadeh, S., Daryanavard, H. & Semati, M.R. A novel multiplier-less convolution core for YOLO CNN ASIC implementation. J Real-Time Image Proc 21, 45 (2024). https://doi.org/10.1007/s11554-024-01419-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01419-7