Skip to main content
Log in

Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

  • Research
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signal processing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.

References

  1. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  2. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448. (2015).

  3. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015).

  4. Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2018)

    Article  Google Scholar 

  5. Dias, M.A., Ferreira, D.A.: Deep learning in reconfigurable hardware: a survey. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 95–98. IEEE, (2019).

  6. El-Shafie, A.H.A., Habib, S.E.: Survey on hardware implementations of visual object trackers. IET Image Proc. 13(6), 863–876 (2019)

    Article  Google Scholar 

  7. Wang, J., Lin, J., Wang, Z.: Efficient hardware architectures for the deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 65(6), 1941–1953 (2017)

    Article  Google Scholar 

  8. Babu, P., Parthasarathy, E.: Hardware acceleration of image and video processing on Xilinx Zynq platform. Intell. Autom. Soft Comput. 30(3) (2021).

  9. Pestana, D., Miranda, P.R., Lopes, J.D., Duarte, R.P., Véstias, M.P., Neto, H.C., De Sousa, J.T.: A full-featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)

    Article  Google Scholar 

  10. Babu, P., Parthasarathy, E.: Optimized object detection method for FPGA implementation. In 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 72–74. IEEE, (2021).

  11. Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)

    Article  Google Scholar 

  12. Zeng, K., Ma, Q., Wu, J.W., Chen, Z., Shen, T., Yan, C.: FPGA-based accelerator for object detection: a comprehensive survey. J. Supercomput. 78(12), 14096–14136 (2022)

    Article  Google Scholar 

  13. Yap, J.W., bin Mohd Yussof, Z., bin Salim, S.I., Lim, KC.: Fixed point implementation of tiny-YOLO-v2 using OpenCL on FPGA. Int. J. Adv. Comput. Sci. Appl. 9(10) (2018).

  14. Günay, B., Okcu, S.B., Bilge, H.Ş. LPYOLO: low precision YOLO for face detection on FPGA. arXiv preprint arXiv:2207.10482 (2022).

  15. Yu, Z., Bouganis, C.S. A parameterisable FPGA-tailored architecture for YOLOv3-tiny. In Applied ReconFigureurable Computing. Architectures, Tools, and Applications: 16th International Symposium, ARC 2020, Toledo, Spain, April 1–3, 2020, Proceedings 16, pp. 330–344. Springer International Publishing, (2020).

  16. Babu, P., Parthasarathy, E.: Hardware acceleration for object detection using YOLOv4 algorithm on Xilinx Zynq platform. J. Real-Time Image Proc. 19(5), 931–940 (2022)

    Article  Google Scholar 

  17. Redmon, J., Farhadi, A. Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018).

  18. Bochkovskiy, A., Wang, C.Y., Liao, H.Y. Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).

  19. Wang, C.Y., Liao, H.Y., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391. (2020).

  20. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. (2017).

  21. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. (2018).

  22. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. (2018).

  23. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768. (2018).

  24. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022).

  25. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. (2017).

  26. Long, X., Deng, K., Wang, G., Zhang, Y., Dang, Q., Gao, Y., Shen, H., Ren, J., Han, S., Ding, E., Wen, S.: PP-YOLO: an effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 (2020).

  27. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, CL.: Microsoft coco: common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer International Publishing, (2014).

  28. Xilinx. Vitis High-Level Synthesis User Guide. (2022).

  29. Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight YOLOv2: A binarized CNN with a parallel support vector regression for an FPGA. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 31–40. (2018).

  30. Wei, G., Hou, Y., Cui, Q., Deng, G., Tao, X., Yao, Y.: YOLO acceleration using FPGA architecture. In 2018 IEEE/CIC International Conference on Communications in China (ICCC), pp. 734–735. IEEE, (2018).

  31. Guo, K., Sui, L., Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping cnn onto customized hardware. In 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 24–29. IEEE, (2016).

  32. Nguyen, X.Q., Pham-Quoc, C.: An fpga-based convolution ip core for deep neural networks acceleration. REV J. Electron. Commun. 12(1–2) (2022).

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Authors

Contributions

AH: methodology, data curation, writing—original draft, visualization, software, validation. HJ: supervision, conceptualization, methodology, writing—review and editing, software.

Corresponding author

Correspondence to Hadi Jahanirad.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hosseiny, A., Jahanirad, H. Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J Real-Time Image Proc 20, 75 (2023). https://doi.org/10.1007/s11554-023-01324-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01324-5

Keywords

Navigation