Skip to main content

Advertisement

Log in

Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Convolutional neural networks (CNNs) are widely used, due to their excellent performance, in many computer vision applications, such as facial recognition, image classification tasks, speech recognition programs, video gaming, etc. However, CNNs require a large number of memory resources and they are also computationally intensive. Field Programmable Gate Arrays (FPGAs), especially the new technology FPGA–SoC, are considered as the most promising platforms for accelerating CNNs, due to their high performance capabilities, energy efficiency, and reconfigurable property. This paper proposes an accelerated CNN model for video compression application based on hardware-software architecture. We first accelerate the CNN layers to build an Intellectual Property (IP) cores using Vivado High Level Synthesis (HLS). Then, we create a hardware-software architecture based on a CNN’s IP cores designed and integrated in the programmable logic zone (PL) which is connected to the Xilinx Processing System (PS) that manage all processing tasks on the FPGA–SoC board. The experimental results demonstrate that our proposed co-design achieves an on-chip power consumption of 1.69 W under a 142 MHz PL frequency and 525 MHz PS frequency. The comparative study with existing methods shows that the design we proposed has obvious advantages in terms of power consumption and hardware cost requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Messaoud S, Ahmed OB, Bradai A, Atri M (2021) Machine learning modelling-powered IoT systems for smart applications. In: IoT-based intelligent modelling for environmental and ecological engineering. Springer, pp 185–212

  2. Messaoud S, Bradai A, Ahmed OB, Quang P, Atri M, Hossain MS (2020) Deep Federated Q-learning-based network slicing for industrial IoT. IEEE Trans Indus Inf

  3. Messaoud S, Bradai A, Bukhari SHR, Qung PTA, Ahmed OB, Atri M (2020) A survey on machine learning in internet of things: algorithms, strategies, and applications. Intern Things 100314

  4. Bouaafia S, Khemiri R, Messaoud S, Ben Ahmed O, Sayadi FE (2021) Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput Appl 1–15

  5. Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) VVC in-loop filtering based on deep convolutional neural network. Comput Intell Neurosci 2021

  6. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems, Lake Tahoe, NV, USA, pp 1097–1105

  7. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, vol 11–12. Montreal, QC, Canada, pp 91–99

  8. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7–12. Boston, MA, USA, pp 815–823

  9. Bouaafia S, Messaoud S, Maraoui A, Ammari AC, Khriji L, Machhout M (2021)Deep pre-trained models for computer vision applications: traffic sign recognition. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 23-28

  10. Dudley HJ, Ren ZJ , Bortz DM (2020) Brain tumor classification in MRI image using convolutional neural network. Math Biosci Eng MBE 17(5):6217–6239

  11. Khriji L, Ammari A, Messaoud S, Bouaafia S, Maraoui A, Machhout M (2021) COVID-19 recognition based on patient’s coughing and breathing patterns analysis: deep learning approach. In: 2021 29th conference of open innovations association (FRUCT). IEEE, pp 185–191

  12. Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Scientific Programming

  13. Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) COVID-19 recognition based on deep transfer learning. In: 2021 IEEE international conference on design & test of integrated micro & nano-systems (DTS). IEEE, pp 1–4

  14. Khemiri R, Kibeya H, Sayadi FE, Bahri N, Atri M, Masmoudi N (2018) Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation. IET Image Proc 12(2):243–253

    Article  Google Scholar 

  15. Sayadi FE, Chouchene M, Bahri H, Khemiri R, Atri M (2019) Parallel full search algorithm for motion estimation on graphic processing unit. Recent Adv Elect Electron Eng 12(4):317–323

    Google Scholar 

  16. Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Arch News 44(3):243–254

    Article  Google Scholar 

  17. Khemiri R, Kibeya H, Loukil H, Sayadi FE, Atri M, Masmoudi N (2018) Real-time motion estimation diamond search algorithm for the new high efficiency video coding on FPGA. Analog Integr Circ Sig Process 94(2):259–276

    Article  Google Scholar 

  18. Sateesan A, Sinha S, Smitha K, Vinod A (2021) A survey of algorithmic and hardware optimization techniques for vision convolutional neural networks on FPGAs. Neural Process Lett 1–47

  19. Sledevic T, Serackis A (1823) mNet2FPGA: a design flow for mapping a fixed-point CNN to Zynq SoC FPGA. Electronics 9(11):1823

  20. Hassan RO, Mostafa H (2020) Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSoC. Analog Integ Circ Sig Process 1–10

  21. Liu Z, Chow P, Xu J et al (2019) A uniform architecture design for accelerating 2d and 3d cnns on fpgas. Electronics 8(1):65

  22. Shen J, Huang Y, Wen M et al (2019) Toward an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA. IEEE Trans Comput Aided Des Integr Circ Syst 39(7):1442–1455

  23. Odetola TA, Groves KM, Hasan SR (2019) 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards. arXiv preprint arXiv:1911.05944

  24. Maraoui A, Messaoud S, Bouaafia S, Ammari AC, Khriji L, Machhout M (2021) PYNQ FPGA hardware implementation of lenet-5-based traffic sign recognition application. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 1004–1009

  25. Mosavi MR, Kaveh M, Khishe M, Aghababaie M (2018) Design and implementation a sonar data set classifier using multi-layer perceptron neural network trained by elephant herding optimization

  26. Mosavi MR, Kaveh M, Khishe M, Aghababaie M Design and implementation a sonar data set classifier using multi-layer perceptron neural

  27. Khishe M, Mosavi MR, Moridi A (2018) Chaotic fractal walk trainer for sonar data set classification using multi-layer perceptron neural network and its hardware implementation. Appl Acoust 137:121–139

    Article  Google Scholar 

  28. Kaveh M, Khishe M, Mosavi MR (2019) Design and implementation of a neighborhood search biogeography-based optimization trainer for classifying sonar dataset using multi-layer perceptron neural network. Analog Integr Circ Sig Process 100(2):405–428

    Article  Google Scholar 

  29. Zhang N, Wei X, Chen H et al (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282

  30. Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) Fast CU partition-based machine learning approach for reducing HEVC complexity. J Real-Time Image Proc 17(1):185–196

    Article  Google Scholar 

  31. I Bouaafia S, Khemiri R, Sayadi FE (2021) Rate-distortion performance comparison: VVC vs. HEVC. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 440–444

  32. Bouaafia S, Khemiri R, Messaoud S, Sayadi FE (2021) Complexity analysis of new future video coding (FVC) standard technology. Int J Digital Multim Broadcast

  33. Li WC, Wang CC, Huang KN (2018) Data mining for fast high efficiency video coding using decision tree. Int J Trend Res Dev 5(1):360–365

  34. Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) SVM-based inter prediction mode decision for HEVC. In: 2020 17th International multi-conference on systems, signals & devices (SSD). IEEE, pp 12–16

  35. Jung SH, Park HW (2015) A fast mode decision method in HEVC using adaptive ordering of modes. IEEE Trans Circ Syst Video Technol 26(10):1846–1858

  36. Hamout H, Elyousfi A (2019) Fast 3D-HEVC PU size decision algorithm for depth map intra-video coding. J Real Time Image Process 1–15

  37. Kim IK, Min J, Lee TW, Han J, Park JH (2012) Block partitioning structure in the HEVC standard. ’IEEE Trans Circ Syst Video Technol 22:1697–1706

  38. Bouaafia S, Khemiri R, Sayadi FE, Atri M, Liouane NA (2020) Deep CNN-LSTM Framework for fast video coding. Int Conf Image Sig Process Springer 205–212

  39. Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Sci Program

  40. Pandey SK, Janghel RR (2019) Recent deep learning techniques, challenges and its applications for medical healthcare system: A review. Neural Process Lett 50(2):1907–1935

    Article  Google Scholar 

  41. Xilinx (2018) PYNQ: python productivity for zynq [Online]. http://www.pynq.io

  42. Xilinx Vivado Design Suite (2017) User guide high-level synthesis. UG902 (v2017.2) April 5, 2017

  43. Skrimponis P, Pissadakis E, Alachiotis N, Pnevmatikatos D (2020) Accelerating binarized convolutional neural networks with dynamic partial reconfiguration on disaggregated FPGAs. In: Parallel computing: technology trends. IOS Press, pp 691–700

  44. Gan F, Zuyi H, Song C, Feng W (2017) Energy-efficient and high-throughput FPGA-based accelerator for convoutional neual networks. In: IEEE international conference on solid-state and integrated circuit technology. IEEE, pp 624–626

  45. Liu B, Zou D, Feng L, Feng S, Fu P, Li J (2019) An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics 8(3):281

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soulef Bouaafia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouaafia, S., Khemiri, R., Messaoud, S. et al. Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC. Neural Process Lett 54, 3283–3301 (2022). https://doi.org/10.1007/s11063-022-10765-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-022-10765-1

Keywords

Navigation