A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Bagherzadeh, Shoorangiz; Daryanavard, Hassan; Semati, Mohammad Reza

doi:10.1007/s11554-024-01419-7

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Research
Published: 04 March 2024

Volume 21, article number 45, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

279 Accesses
Explore all metrics

Abstract

The You Only Look Once (YOLO) algorithm has a good trade-off between accuracy and execution speed in object detection. The main bottleneck of execution speed in YOLO is the optimum implementation of the Convolutional Neural Network (CNN). Reducing convolution core resources to increase parallelism can significantly increase the execution speed of the Algorithm. A new ASIC Processing Element (PE) is presented in this paper to reduce power consumption and increase speed while utilizing fewer resources. A multiplier-less convolution core is proposed by replacing multipliers with multiplexer circuits and designing a 19-input adder. Reducing the weight word length to five bits and compensating for the accuracy with the new quantization, has made the accuracy of the new architecture competitive with previous works. Compared with the traditional convolutional core, the best-proposed core has been improved by 4.44X, 4.9X, and 32% in power consumption, area, and delay, respectively. Placing the proposed core in the PE, the power consumption, FPS, and accuracy were 1.76W, 55.8, and 78%, respectively. Although the proposed 3 × 3 convolution core was evaluated using YOLOv2 and YOLOv4-tiny, it is also applicable to YOLOv7 and YOLOv8.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

A Precision-Aware Neuron Engine for DNN Accelerators

Article 26 April 2024

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Article 11 March 2024

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.-J.: A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
Article Google Scholar
Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)
Article Google Scholar
Liu, W., et al.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) European conference on computer vision, pp. 21–37. Springer, Cham (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 28, (2015).
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271 (2017)
Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real Time Image Process. 18(3), 481–492 (2020)
Article Google Scholar
Seong, S., Song, J., Yoon, D., Kim, J., Choi, J.: Determination of vehicle trajectory through optimization of vehicle bounding boxes using a convolutional neural network. Sensors 19(19), 4263 (2019)
Article Google Scholar
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, (2020).
Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Proc. Comput. Sci. 199, 1066–1073 (2022)
Article Google Scholar
Li, C., et al.: YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, (2022).
Hosseiny, A., Jahanirad, H.: Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J. Real-Time Image Proc. 20(4), 75 (2023)
Article Google Scholar
Talaat, F.M., ZainEldin, H.: An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08809-1
Article Google Scholar
Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)
Article Google Scholar
Wei, X., Liu, W., Chen, L., Ma, L., Chen, H., Zhuang, Y.: FPGA-based hybrid-type implementation of quantized neural networks for remote sensing applications. Sensors 19(4), 924 (2019)
Article Google Scholar
Li, W., et al.: A real-time tree crown detection approach for large-scale remote sensing images on FPGAs. Remote Sensing 11(9), 1025 (2019)
Article Google Scholar
Zhang, N., Wei, X., Chen, H., Liu, W.: FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3), 282 (2021)
Article Google Scholar
Chen, L., Wei, X., Liu, W., Chen, H., Chen, L.: Hardware implementation of convolutional neural network-based remote sensing image classification method. In International Conference in Communications, Signal Processing, and Systems, pp. 140–148: Springer (2018)
Jain, V., Jadhav, N., Verhelst, M.: Enabling real-time object detection on low cost FPGAs. J. Real Time Image Process. 19(1), 217–229 (2022)
Article Google Scholar
Mirsalari, S.A., Nazari, N., Ansarmohammadi, S.A., Salehi, M.E., Ghiasi, S.: E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems. J. Real Time Image Process. 18(4), 1285–1299 (2021)
Article Google Scholar
Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circuits Syst. 1, 76–87 (2020)
Article Google Scholar
Xie, W., Zhang, C., Zhang, Y., Hu, C., Jiang, H., Wang, Z.: An energy-efficient FPGA-based embedded system for CNN application. In 2018 IEEE international conference on electron devices and solid state circuits (EDSSC), pp. 1–2. IEEE (2018)
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038. (2021)
Ando, K., Takamaeda-Yamazaki, S., Ikebe, M., Asai, T., Motomura, M.: A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 8(6), 149–170 (2017)
Article Google Scholar
Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y.C., Hsiao, C.-Y., Lee, W.-H.: Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 11(7), 786 (2019)
Article Google Scholar
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2017)
Article Google Scholar
Choi, Y., Bae, D., Sim, J., Choi, S., Kim, M., Kim, L.-S.: Energy-efficient design of processing element for convolutional neural network. IEEE Trans. Circuits Syst. II Express Briefs 64(11), 1332–1336 (2017)
Google Scholar
Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)
Article Google Scholar
Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)
Article Google Scholar
Valadanzoj, Z., Daryanavard, H., Harifi, A.: High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J. Supercomput. (2023). https://doi.org/10.1007/s11227-023-05713-2
Article Google Scholar
Shafiei, M., Daryanavard, H., Hatam, A.: Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J. Real-Time Image Proc. 20(5), 94 (2023)
Article Google Scholar
Libano, F., Wilson, B., Wirthlin, M., Rech, P., Brunhaver, J.: Understanding the impact of quantization, accuracy, and radiation on the reliability of convolutional neural networks on FPGAs. IEEE Trans. Nucl. Sci. 67(7), 1478–1484 (2020)
Article Google Scholar
Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens. J. 21(5), 5668–5677 (2020)
Article Google Scholar
Talib, M.A., Majzoub, S., Nasir, Q., Jamal, D.: A systematic literature review on hardware implementation of artificial intelligence algorithms. J. Supercomput. 77(2), 1897–1938 (2021)
Article Google Scholar
Patel, S.K., Singhal, S.K.: Area–delay and energy efficient multi-operand binary tree adder. IET Circuits Devices Syst. 14(5), 586–593 (2020)
Article Google Scholar
Baskin, C., Liss, N., Zheltonozhskii, E., Bronstein, A.M., Mendelson, A.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 162–169. IEEE (2018)
Fathi, A., Ghasemi, M.M., Khoei, A.: Design and performance analysis of an ultra-high-speed 5–2 compressor. Int. J. Circuit Theory Appl. 50(5), 1576–1588 (2022)
Article Google Scholar
Asadi, M.-A., Mosleh, M., Haghparast, M.: Toward novel designs of reversible ternary 6: 2 compressor using efficient reversible ternary full-adders. J. Supercomput. 77(5), 5176–5197 (2021)
Article Google Scholar
Fathi, A., Mashoufi, B., Azizian, S.: Very fast, high-performance 5–2 and 7–2 compressors in CMOS process for rapid parallel accumulations. IEEE Trans. Very Large Scale Integr. Syst. 28(6), 1403–1412 (2020)
Article Google Scholar
Azzaz, M., Tanougast, C., Sadoudi, S., Dandache, A.: Real-time FPGA implementation of Lorenz's chaotic generator for ciphering telecommunications. In 2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference, pp. 1–4. IEEE (2009)
Liu, W., Ma, L., Wang, J.: Detection of multiclass objects in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 16(5), 791–795 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Hormozgan, Bandar Abbas, Iran
Shoorangiz Bagherzadeh, Hassan Daryanavard & Mohammad Reza Semati

Authors

Shoorangiz Bagherzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Daryanavard
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Reza Semati
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H. Daryanvard and S. Bagherzadeh have done the designs and simulations. H. Daryanavard wrote the main manuscript text. Mr. Samti has participated in editing the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Competing interests

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bagherzadeh, S., Daryanavard, H. & Semati, M.R. A novel multiplier-less convolution core for YOLO CNN ASIC implementation. J Real-Time Image Proc 21, 45 (2024). https://doi.org/10.1007/s11554-024-01419-7

Download citation

Received: 31 December 2022
Accepted: 09 January 2024
Published: 04 March 2024
DOI: https://doi.org/10.1007/s11554-024-01419-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Abstract

Access this article

Similar content being viewed by others

YOLO-based Object Detection Models: A Review and its Applications

A Precision-Aware Neuron Engine for DNN Accelerators

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

Abstract

Access this article

Similar content being viewed by others

YOLO-based Object Detection Models: A Review and its Applications

A Precision-Aware Neuron Engine for DNN Accelerators

A lightweight YOLOv8 integrating FasterNet for real-time underwater object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation