Skip to main content
Log in

A novel multiplier-less convolution core for YOLO CNN ASIC implementation

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The You Only Look Once (YOLO) algorithm has a good trade-off between accuracy and execution speed in object detection. The main bottleneck of execution speed in YOLO is the optimum implementation of the Convolutional Neural Network (CNN). Reducing convolution core resources to increase parallelism can significantly increase the execution speed of the Algorithm. A new ASIC Processing Element (PE) is presented in this paper to reduce power consumption and increase speed while utilizing fewer resources. A multiplier-less convolution core is proposed by replacing multipliers with multiplexer circuits and designing a 19-input adder. Reducing the weight word length to five bits and compensating for the accuracy with the new quantization, has made the accuracy of the new architecture competitive with previous works. Compared with the traditional convolutional core, the best-proposed core has been improved by 4.44X, 4.9X, and 32% in power consumption, area, and delay, respectively. Placing the proposed core in the PE, the power consumption, FPS, and accuracy were 1.76W, 55.8, and 78%, respectively. Although the proposed 3 × 3 convolution core was evaluated using YOLOv2 and YOLOv4-tiny, it is also applicable to YOLOv7 and YOLOv8.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Fig. 3
Algorithm 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.-J.: A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)

    Article  Google Scholar 

  2. Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: Imbalance problems in object detection: a review. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3388–3415 (2020)

    Article  Google Scholar 

  3. Liu, W., et al.: Ssd: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) European conference on computer vision, pp. 21–37. Springer, Cham (2016)

    Google Scholar 

  4. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems. 28, (2015).

  5. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger," in Proceedings of the IEEE conference on computer vision and pattern recognition. 7263–7271 (2017)

  6. Xu, K., et al.: A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real Time Image Process. 18(3), 481–492 (2020)

    Article  Google Scholar 

  7. Seong, S., Song, J., Yoon, D., Kim, J., Choi, J.: Determination of vehicle trajectory through optimization of vehicle bounding boxes using a convolutional neural network. Sensors 19(19), 4263 (2019)

    Article  Google Scholar 

  8. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, (2020).

  9. Jiang, P., Ergu, D., Liu, F., Cai, Y., Ma, B.: A review of Yolo algorithm developments. Proc. Comput. Sci. 199, 1066–1073 (2022)

    Article  Google Scholar 

  10. Li, C., et al.: YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, (2022).

  11. Hosseiny, A., Jahanirad, H.: Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J. Real-Time Image Proc. 20(4), 75 (2023)

    Article  Google Scholar 

  12. Talaat, F.M., ZainEldin, H.: An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. (2023). https://doi.org/10.1007/s00521-023-08809-1

    Article  Google Scholar 

  13. Huang, H., Liu, Z., Chen, T., Hu, X., Zhang, Q., Xiong, X.: Design space exploration for yolo neural network accelerator. Electronics 9(11), 1921 (2020)

    Article  Google Scholar 

  14. Wei, X., Liu, W., Chen, L., Ma, L., Chen, H., Zhuang, Y.: FPGA-based hybrid-type implementation of quantized neural networks for remote sensing applications. Sensors 19(4), 924 (2019)

    Article  Google Scholar 

  15. Li, W., et al.: A real-time tree crown detection approach for large-scale remote sensing images on FPGAs. Remote Sensing 11(9), 1025 (2019)

    Article  Google Scholar 

  16. Zhang, N., Wei, X., Chen, H., Liu, W.: FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3), 282 (2021)

    Article  Google Scholar 

  17. Chen, L., Wei, X., Liu, W., Chen, H., Chen, L.: Hardware implementation of convolutional neural network-based remote sensing image classification method. In International Conference in Communications, Signal Processing, and Systems, pp. 140–148: Springer (2018)

  18. Jain, V., Jadhav, N., Verhelst, M.: Enabling real-time object detection on low cost FPGAs. J. Real Time Image Process. 19(1), 217–229 (2022)

    Article  Google Scholar 

  19. Mirsalari, S.A., Nazari, N., Ansarmohammadi, S.A., Salehi, M.E., Ghiasi, S.: E2BNet: MAC-free yet accurate 2-level binarized neural network accelerator for embedded systems. J. Real Time Image Process. 18(4), 1285–1299 (2021)

    Article  Google Scholar 

  20. Farrukh, F.U.D., et al.: Power efficient tiny yolo cnn using reduced hardware resources based on booth multiplier and wallace tree adders. IEEE Open J. Circuits Syst. 1, 76–87 (2020)

    Article  Google Scholar 

  21. Xie, W., Zhang, C., Zhang, Y., Hu, C., Jiang, H., Wang, Z.: An energy-efficient FPGA-based embedded system for CNN application. In 2018 IEEE international conference on electron devices and solid state circuits (EDSSC), pp. 1–2. IEEE (2018)

  22. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038. (2021)

  23. Ando, K., Takamaeda-Yamazaki, S., Ikebe, M., Asai, T., Motomura, M.: A multithreaded CGRA for convolutional neural network processing. Circuits Syst. 8(6), 149–170 (2017)

    Article  Google Scholar 

  24. Chang, Y.-L., Anagaw, A., Chang, L., Wang, Y.C., Hsiao, C.-Y., Lee, W.-H.: Ship detection based on YOLOv2 for SAR imagery. Remote Sens. 11(7), 786 (2019)

    Article  Google Scholar 

  25. Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regul. Pap. 65(1), 198–208 (2017)

    Article  Google Scholar 

  26. Choi, Y., Bae, D., Sim, J., Choi, S., Kim, M., Kim, L.-S.: Energy-efficient design of processing element for convolutional neural network. IEEE Trans. Circuits Syst. II Express Briefs 64(11), 1332–1336 (2017)

    Google Scholar 

  27. Song, Q., Zhang, J., Sun, L., Jin, G.: Design and implementation of convolutional neural networks accelerator based on multidie. IEEE Access 10, 91497–91508 (2022)

    Article  Google Scholar 

  28. Pestana, D., et al.: A full featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)

    Article  Google Scholar 

  29. Valadanzoj, Z., Daryanavard, H., Harifi, A.: High-speed YOLOv4-tiny hardware accelerator for self-driving automotive. J. Supercomput. (2023). https://doi.org/10.1007/s11227-023-05713-2

    Article  Google Scholar 

  30. Shafiei, M., Daryanavard, H., Hatam, A.: Scalable and custom-precision floating-point hardware convolution core for using in AI edge processors. J. Real-Time Image Proc. 20(5), 94 (2023)

    Article  Google Scholar 

  31. Libano, F., Wilson, B., Wirthlin, M., Rech, P., Brunhaver, J.: Understanding the impact of quantization, accuracy, and radiation on the reliability of convolutional neural networks on FPGAs. IEEE Trans. Nucl. Sci. 67(7), 1478–1484 (2020)

    Article  Google Scholar 

  32. Ravindran, R., Santora, M.J., Jamali, M.M.: Multi-object detection and tracking, based on DNN, for autonomous vehicles: a review. IEEE Sens. J. 21(5), 5668–5677 (2020)

    Article  Google Scholar 

  33. Talib, M.A., Majzoub, S., Nasir, Q., Jamal, D.: A systematic literature review on hardware implementation of artificial intelligence algorithms. J. Supercomput. 77(2), 1897–1938 (2021)

    Article  Google Scholar 

  34. Patel, S.K., Singhal, S.K.: Area–delay and energy efficient multi-operand binary tree adder. IET Circuits Devices Syst. 14(5), 586–593 (2020)

    Article  Google Scholar 

  35. Baskin, C., Liss, N., Zheltonozhskii, E., Bronstein, A.M., Mendelson, A.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. In 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 162–169. IEEE (2018)

  36. Fathi, A., Ghasemi, M.M., Khoei, A.: Design and performance analysis of an ultra-high-speed 5–2 compressor. Int. J. Circuit Theory Appl. 50(5), 1576–1588 (2022)

    Article  Google Scholar 

  37. Asadi, M.-A., Mosleh, M., Haghparast, M.: Toward novel designs of reversible ternary 6: 2 compressor using efficient reversible ternary full-adders. J. Supercomput. 77(5), 5176–5197 (2021)

    Article  Google Scholar 

  38. Fathi, A., Mashoufi, B., Azizian, S.: Very fast, high-performance 5–2 and 7–2 compressors in CMOS process for rapid parallel accumulations. IEEE Trans. Very Large Scale Integr. Syst. 28(6), 1403–1412 (2020)

    Article  Google Scholar 

  39. Azzaz, M., Tanougast, C., Sadoudi, S., Dandache, A.: Real-time FPGA implementation of Lorenz's chaotic generator for ciphering telecommunications. In 2009 Joint IEEE North-East Workshop on Circuits and Systems and TAISA Conference, pp. 1–4. IEEE (2009)

  40. Liu, W., Ma, L., Wang, J.: Detection of multiclass objects in optical remote sensing images. IEEE Geosci. Remote Sens. Lett. 16(5), 791–795 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

H. Daryanvard and S. Bagherzadeh have done the designs and simulations. H. Daryanavard wrote the main manuscript text. Mr. Samti has participated in editing the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hassan Daryanavard.

Ethics declarations

Competing interests

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bagherzadeh, S., Daryanavard, H. & Semati, M.R. A novel multiplier-less convolution core for YOLO CNN ASIC implementation. J Real-Time Image Proc 21, 45 (2024). https://doi.org/10.1007/s11554-024-01419-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-024-01419-7

Keywords

Navigation