Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

Hosseiny, Adib; Jahanirad, Hadi

doi:10.1007/s11554-023-01324-5

Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

Research
Published: 17 June 2023

Volume 20, article number 75, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

1015 Accesses
5 Citations
Explore all metrics

Abstract

FPGAs have emerged as a promising platform for implementing neural networks due to their reconfigurability, parallelism, and low power consumption. Nonetheless, designing and optimizing FPGA-based neural network accelerators is a complex and time-consuming task with register transfer level (RTL) languages. High-level synthesis (HLS) tools provide a higher level of abstraction for FPGA design, enabling designers to concentrate on top-level design aspects, such as algorithms, rather than low-level hardware implementation details. One of the state-of-the-art object detection networks is you look only once (YOLO) network series which is constructed using different neural network technologies using cross-stage connections and feature extraction techniques like pyramid networks. In this paper, we propose a method for the implementation of YOLOv7-tiny network on FPGAs using HLS tools. We present a comprehensive analysis of the performance and resource utilization of FPGA-based neural network accelerators. Our methods show excellent results for real-time application requirements such as latency. Specifically, our work reduces the usage of digital signal processing (DSP) units by 90% and it saves up to 60% of flip-flops compared to state-of-the-art designs, while achieving competitive usage of block RAM and look-up tables. Additionally, the achieved design latency of 15 ms is extremely suitable for real-time applications. Also we will propose a method for BRAM utilization method and off-chip memory access.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 5

Fig. 7

A review of object detection based on deep learning

Article 12 June 2020

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

Data availability

The datasets generated and/or analyzed during the present study are available from the corresponding author on reasonable request.

References

He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448. (2015).
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015).
Shawahna, A., Sait, S.M., El-Maleh, A.: FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access 7, 7823–7859 (2018)
Article Google Scholar
Dias, M.A., Ferreira, D.A.: Deep learning in reconfigurable hardware: a survey. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 95–98. IEEE, (2019).
El-Shafie, A.H.A., Habib, S.E.: Survey on hardware implementations of visual object trackers. IET Image Proc. 13(6), 863–876 (2019)
Article Google Scholar
Wang, J., Lin, J., Wang, Z.: Efficient hardware architectures for the deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 65(6), 1941–1953 (2017)
Article Google Scholar
Babu, P., Parthasarathy, E.: Hardware acceleration of image and video processing on Xilinx Zynq platform. Intell. Autom. Soft Comput. 30(3) (2021).
Pestana, D., Miranda, P.R., Lopes, J.D., Duarte, R.P., Véstias, M.P., Neto, H.C., De Sousa, J.T.: A full-featured configurable accelerator for object detection with YOLO. IEEE Access 9, 75864–75877 (2021)
Article Google Scholar
Babu, P., Parthasarathy, E.: Optimized object detection method for FPGA implementation. In 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 72–74. IEEE, (2021).
Nguyen, D.T., Nguyen, T.N., Kim, H., Lee, H.J.: A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 27(8), 1861–1873 (2019)
Article Google Scholar
Zeng, K., Ma, Q., Wu, J.W., Chen, Z., Shen, T., Yan, C.: FPGA-based accelerator for object detection: a comprehensive survey. J. Supercomput. 78(12), 14096–14136 (2022)
Article Google Scholar
Yap, J.W., bin Mohd Yussof, Z., bin Salim, S.I., Lim, KC.: Fixed point implementation of tiny-YOLO-v2 using OpenCL on FPGA. Int. J. Adv. Comput. Sci. Appl. 9(10) (2018).
Günay, B., Okcu, S.B., Bilge, H.Ş. LPYOLO: low precision YOLO for face detection on FPGA. arXiv preprint arXiv:2207.10482 (2022).
Yu, Z., Bouganis, C.S. A parameterisable FPGA-tailored architecture for YOLOv3-tiny. In Applied ReconFigureurable Computing. Architectures, Tools, and Applications: 16th International Symposium, ARC 2020, Toledo, Spain, April 1–3, 2020, Proceedings 16, pp. 330–344. Springer International Publishing, (2020).
Babu, P., Parthasarathy, E.: Hardware acceleration for object detection using YOLOv4 algorithm on Xilinx Zynq platform. J. Real-Time Image Proc. 19(5), 931–940 (2022)
Article Google Scholar
Redmon, J., Farhadi, A. Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018).
Bochkovskiy, A., Wang, C.Y., Liao, H.Y. Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020).
Wang, C.Y., Liao, H.Y., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 390–391. (2020).
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708. (2017).
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141. (2018).
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19. (2018).
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768. (2018).
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022).
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125. (2017).
Long, X., Deng, K., Wang, G., Zhang, Y., Dang, Q., Gao, Y., Shen, H., Ren, J., Han, S., Ding, E., Wen, S.: PP-YOLO: an effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 (2020).
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, CL.: Microsoft coco: common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13, pp. 740–755. Springer International Publishing, (2014).
Xilinx. Vitis High-Level Synthesis User Guide. (2022).
Nakahara, H., Yonekawa, H., Fujii, T., Sato, S.: A lightweight YOLOv2: A binarized CNN with a parallel support vector regression for an FPGA. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 31–40. (2018).
Wei, G., Hou, Y., Cui, Q., Deng, G., Tao, X., Yao, Y.: YOLO acceleration using FPGA architecture. In 2018 IEEE/CIC International Conference on Communications in China (ICCC), pp. 734–735. IEEE, (2018).
Guo, K., Sui, L., Qiu, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping cnn onto customized hardware. In 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 24–29. IEEE, (2016).
Nguyen, X.Q., Pham-Quoc, C.: An fpga-based convolution ip core for deep neural networks acceleration. REV J. Electron. Commun. 12(1–2) (2022).

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, University of Kurdistan, Sanandaj, 90210, Iran
Adib Hosseiny & Hadi Jahanirad

Authors

Adib Hosseiny
View author publications
You can also search for this author in PubMed Google Scholar
Hadi Jahanirad
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

AH: methodology, data curation, writing—original draft, visualization, software, validation. HJ: supervision, conceptualization, methodology, writing—review and editing, software.

Corresponding author

Correspondence to Hadi Jahanirad.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hosseiny, A., Jahanirad, H. Hardware acceleration of YOLOv7-tiny using high-level synthesis tools. J Real-Time Image Proc 20, 75 (2023). https://doi.org/10.1007/s11554-023-01324-5

Download citation

Received: 17 March 2023
Accepted: 29 May 2023
Published: 17 June 2023
DOI: https://doi.org/10.1007/s11554-023-01324-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

A survey of the recent architectures of deep convolutional neural networks

A review of convolutional neural networks in computer vision

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hardware acceleration of YOLOv7-tiny using high-level synthesis tools

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

A survey of the recent architectures of deep convolutional neural networks

A review of convolutional neural networks in computer vision

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation