Abstract
Convolutional neural networks (CNNs) are widely used, due to their excellent performance, in many computer vision applications, such as facial recognition, image classification tasks, speech recognition programs, video gaming, etc. However, CNNs require a large number of memory resources and they are also computationally intensive. Field Programmable Gate Arrays (FPGAs), especially the new technology FPGA–SoC, are considered as the most promising platforms for accelerating CNNs, due to their high performance capabilities, energy efficiency, and reconfigurable property. This paper proposes an accelerated CNN model for video compression application based on hardware-software architecture. We first accelerate the CNN layers to build an Intellectual Property (IP) cores using Vivado High Level Synthesis (HLS). Then, we create a hardware-software architecture based on a CNN’s IP cores designed and integrated in the programmable logic zone (PL) which is connected to the Xilinx Processing System (PS) that manage all processing tasks on the FPGA–SoC board. The experimental results demonstrate that our proposed co-design achieves an on-chip power consumption of 1.69 W under a 142 MHz PL frequency and 525 MHz PS frequency. The comparative study with existing methods shows that the design we proposed has obvious advantages in terms of power consumption and hardware cost requirements.
Similar content being viewed by others
References
Messaoud S, Ahmed OB, Bradai A, Atri M (2021) Machine learning modelling-powered IoT systems for smart applications. In: IoT-based intelligent modelling for environmental and ecological engineering. Springer, pp 185–212
Messaoud S, Bradai A, Ahmed OB, Quang P, Atri M, Hossain MS (2020) Deep Federated Q-learning-based network slicing for industrial IoT. IEEE Trans Indus Inf
Messaoud S, Bradai A, Bukhari SHR, Qung PTA, Ahmed OB, Atri M (2020) A survey on machine learning in internet of things: algorithms, strategies, and applications. Intern Things 100314
Bouaafia S, Khemiri R, Messaoud S, Ben Ahmed O, Sayadi FE (2021) Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput Appl 1–15
Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) VVC in-loop filtering based on deep convolutional neural network. Comput Intell Neurosci 2021
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems, Lake Tahoe, NV, USA, pp 1097–1105
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, vol 11–12. Montreal, QC, Canada, pp 91–99
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7–12. Boston, MA, USA, pp 815–823
Bouaafia S, Messaoud S, Maraoui A, Ammari AC, Khriji L, Machhout M (2021)Deep pre-trained models for computer vision applications: traffic sign recognition. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 23-28
Dudley HJ, Ren ZJ , Bortz DM (2020) Brain tumor classification in MRI image using convolutional neural network. Math Biosci Eng MBE 17(5):6217–6239
Khriji L, Ammari A, Messaoud S, Bouaafia S, Maraoui A, Machhout M (2021) COVID-19 recognition based on patient’s coughing and breathing patterns analysis: deep learning approach. In: 2021 29th conference of open innovations association (FRUCT). IEEE, pp 185–191
Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Scientific Programming
Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) COVID-19 recognition based on deep transfer learning. In: 2021 IEEE international conference on design & test of integrated micro & nano-systems (DTS). IEEE, pp 1–4
Khemiri R, Kibeya H, Sayadi FE, Bahri N, Atri M, Masmoudi N (2018) Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation. IET Image Proc 12(2):243–253
Sayadi FE, Chouchene M, Bahri H, Khemiri R, Atri M (2019) Parallel full search algorithm for motion estimation on graphic processing unit. Recent Adv Elect Electron Eng 12(4):317–323
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Arch News 44(3):243–254
Khemiri R, Kibeya H, Loukil H, Sayadi FE, Atri M, Masmoudi N (2018) Real-time motion estimation diamond search algorithm for the new high efficiency video coding on FPGA. Analog Integr Circ Sig Process 94(2):259–276
Sateesan A, Sinha S, Smitha K, Vinod A (2021) A survey of algorithmic and hardware optimization techniques for vision convolutional neural networks on FPGAs. Neural Process Lett 1–47
Sledevic T, Serackis A (1823) mNet2FPGA: a design flow for mapping a fixed-point CNN to Zynq SoC FPGA. Electronics 9(11):1823
Hassan RO, Mostafa H (2020) Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSoC. Analog Integ Circ Sig Process 1–10
Liu Z, Chow P, Xu J et al (2019) A uniform architecture design for accelerating 2d and 3d cnns on fpgas. Electronics 8(1):65
Shen J, Huang Y, Wen M et al (2019) Toward an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA. IEEE Trans Comput Aided Des Integr Circ Syst 39(7):1442–1455
Odetola TA, Groves KM, Hasan SR (2019) 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards. arXiv preprint arXiv:1911.05944
Maraoui A, Messaoud S, Bouaafia S, Ammari AC, Khriji L, Machhout M (2021) PYNQ FPGA hardware implementation of lenet-5-based traffic sign recognition application. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 1004–1009
Mosavi MR, Kaveh M, Khishe M, Aghababaie M (2018) Design and implementation a sonar data set classifier using multi-layer perceptron neural network trained by elephant herding optimization
Mosavi MR, Kaveh M, Khishe M, Aghababaie M Design and implementation a sonar data set classifier using multi-layer perceptron neural
Khishe M, Mosavi MR, Moridi A (2018) Chaotic fractal walk trainer for sonar data set classification using multi-layer perceptron neural network and its hardware implementation. Appl Acoust 137:121–139
Kaveh M, Khishe M, Mosavi MR (2019) Design and implementation of a neighborhood search biogeography-based optimization trainer for classifying sonar dataset using multi-layer perceptron neural network. Analog Integr Circ Sig Process 100(2):405–428
Zhang N, Wei X, Chen H et al (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282
Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) Fast CU partition-based machine learning approach for reducing HEVC complexity. J Real-Time Image Proc 17(1):185–196
I Bouaafia S, Khemiri R, Sayadi FE (2021) Rate-distortion performance comparison: VVC vs. HEVC. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 440–444
Bouaafia S, Khemiri R, Messaoud S, Sayadi FE (2021) Complexity analysis of new future video coding (FVC) standard technology. Int J Digital Multim Broadcast
Li WC, Wang CC, Huang KN (2018) Data mining for fast high efficiency video coding using decision tree. Int J Trend Res Dev 5(1):360–365
Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) SVM-based inter prediction mode decision for HEVC. In: 2020 17th International multi-conference on systems, signals & devices (SSD). IEEE, pp 12–16
Jung SH, Park HW (2015) A fast mode decision method in HEVC using adaptive ordering of modes. IEEE Trans Circ Syst Video Technol 26(10):1846–1858
Hamout H, Elyousfi A (2019) Fast 3D-HEVC PU size decision algorithm for depth map intra-video coding. J Real Time Image Process 1–15
Kim IK, Min J, Lee TW, Han J, Park JH (2012) Block partitioning structure in the HEVC standard. ’IEEE Trans Circ Syst Video Technol 22:1697–1706
Bouaafia S, Khemiri R, Sayadi FE, Atri M, Liouane NA (2020) Deep CNN-LSTM Framework for fast video coding. Int Conf Image Sig Process Springer 205–212
Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Sci Program
Pandey SK, Janghel RR (2019) Recent deep learning techniques, challenges and its applications for medical healthcare system: A review. Neural Process Lett 50(2):1907–1935
Xilinx (2018) PYNQ: python productivity for zynq [Online]. http://www.pynq.io
Xilinx Vivado Design Suite (2017) User guide high-level synthesis. UG902 (v2017.2) April 5, 2017
Skrimponis P, Pissadakis E, Alachiotis N, Pnevmatikatos D (2020) Accelerating binarized convolutional neural networks with dynamic partial reconfiguration on disaggregated FPGAs. In: Parallel computing: technology trends. IOS Press, pp 691–700
Gan F, Zuyi H, Song C, Feng W (2017) Energy-efficient and high-throughput FPGA-based accelerator for convoutional neual networks. In: IEEE international conference on solid-state and integrated circuit technology. IEEE, pp 624–626
Liu B, Zou D, Feng L, Feng S, Fu P, Li J (2019) An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics 8(3):281
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bouaafia, S., Khemiri, R., Messaoud, S. et al. Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC. Neural Process Lett 54, 3283–3301 (2022). https://doi.org/10.1007/s11063-022-10765-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10765-1