Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC

Bouaafia, Soulef; Khemiri, Randa; Messaoud, Seifeddine; Sayadi, Fatma Ezahra

doi:10.1007/s11063-022-10765-1

Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC

Published: 25 February 2022

Volume 54, pages 3283–3301, (2022)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Soulef Bouaafia ORCID: orcid.org/0000-0003-0657-6900¹,
Randa Khemiri^1,2,
Seifeddine Messaoud¹ &
…
Fatma Ezahra Sayadi³

474 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Convolutional neural networks (CNNs) are widely used, due to their excellent performance, in many computer vision applications, such as facial recognition, image classification tasks, speech recognition programs, video gaming, etc. However, CNNs require a large number of memory resources and they are also computationally intensive. Field Programmable Gate Arrays (FPGAs), especially the new technology FPGA–SoC, are considered as the most promising platforms for accelerating CNNs, due to their high performance capabilities, energy efficiency, and reconfigurable property. This paper proposes an accelerated CNN model for video compression application based on hardware-software architecture. We first accelerate the CNN layers to build an Intellectual Property (IP) cores using Vivado High Level Synthesis (HLS). Then, we create a hardware-software architecture based on a CNN’s IP cores designed and integrated in the programmable logic zone (PL) which is connected to the Xilinx Processing System (PS) that manage all processing tasks on the FPGA–SoC board. The experimental results demonstrate that our proposed co-design achieves an on-chip power consumption of 1.69 W under a 142 MHz PL frequency and 525 MHz PS frequency. The comparative study with existing methods shows that the design we proposed has obvious advantages in terms of power consumption and hardware cost requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video Restoration Using Convolutional Neural Networks for Low-Level FPGAs

Fpga-based SoC design for real-time facial point detection using deep convolutional neural networks with dynamic partial reconfiguration

Article 14 May 2024

Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC

Article 24 March 2020

References

Messaoud S, Ahmed OB, Bradai A, Atri M (2021) Machine learning modelling-powered IoT systems for smart applications. In: IoT-based intelligent modelling for environmental and ecological engineering. Springer, pp 185–212
Messaoud S, Bradai A, Ahmed OB, Quang P, Atri M, Hossain MS (2020) Deep Federated Q-learning-based network slicing for industrial IoT. IEEE Trans Indus Inf
Messaoud S, Bradai A, Bukhari SHR, Qung PTA, Ahmed OB, Atri M (2020) A survey on machine learning in internet of things: algorithms, strategies, and applications. Intern Things 100314
Bouaafia S, Khemiri R, Messaoud S, Ben Ahmed O, Sayadi FE (2021) Deep learning-based video quality enhancement for the new versatile video coding. Neural Comput Appl 1–15
Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) VVC in-loop filtering based on deep convolutional neural network. Comput Intell Neurosci 2021
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems, Lake Tahoe, NV, USA, pp 1097–1105
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, vol 11–12. Montreal, QC, Canada, pp 91–99
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, vol 7–12. Boston, MA, USA, pp 815–823
Bouaafia S, Messaoud S, Maraoui A, Ammari AC, Khriji L, Machhout M (2021)Deep pre-trained models for computer vision applications: traffic sign recognition. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 23-28
Dudley HJ, Ren ZJ , Bortz DM (2020) Brain tumor classification in MRI image using convolutional neural network. Math Biosci Eng MBE 17(5):6217–6239
Khriji L, Ammari A, Messaoud S, Bouaafia S, Maraoui A, Machhout M (2021) COVID-19 recognition based on patient’s coughing and breathing patterns analysis: deep learning approach. In: 2021 29th conference of open innovations association (FRUCT). IEEE, pp 185–191
Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Scientific Programming
Bouaafia S, Messaoud S, Khemiri R, Sayadi FE (2021) COVID-19 recognition based on deep transfer learning. In: 2021 IEEE international conference on design & test of integrated micro & nano-systems (DTS). IEEE, pp 1–4
Khemiri R, Kibeya H, Sayadi FE, Bahri N, Atri M, Masmoudi N (2018) Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation. IET Image Proc 12(2):243–253
Article Google Scholar
Sayadi FE, Chouchene M, Bahri H, Khemiri R, Atri M (2019) Parallel full search algorithm for motion estimation on graphic processing unit. Recent Adv Elect Electron Eng 12(4):317–323
Google Scholar
Han S, Liu X, Mao H, Pu J, Pedram A, Horowitz MA, Dally WJ (2016) EIE: efficient inference engine on compressed deep neural network. ACM SIGARCH Comput Arch News 44(3):243–254
Article Google Scholar
Khemiri R, Kibeya H, Loukil H, Sayadi FE, Atri M, Masmoudi N (2018) Real-time motion estimation diamond search algorithm for the new high efficiency video coding on FPGA. Analog Integr Circ Sig Process 94(2):259–276
Article Google Scholar
Sateesan A, Sinha S, Smitha K, Vinod A (2021) A survey of algorithmic and hardware optimization techniques for vision convolutional neural networks on FPGAs. Neural Process Lett 1–47
Sledevic T, Serackis A (1823) mNet2FPGA: a design flow for mapping a fixed-point CNN to Zynq SoC FPGA. Electronics 9(11):1823
Hassan RO, Mostafa H (2020) Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSoC. Analog Integ Circ Sig Process 1–10
Liu Z, Chow P, Xu J et al (2019) A uniform architecture design for accelerating 2d and 3d cnns on fpgas. Electronics 8(1):65
Shen J, Huang Y, Wen M et al (2019) Toward an efficient deep pipelined template-based architecture for accelerating the entire 2-D and 3-D CNNs on FPGA. IEEE Trans Comput Aided Des Integr Circ Syst 39(7):1442–1455
Odetola TA, Groves KM, Hasan SR (2019) 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards. arXiv preprint arXiv:1911.05944
Maraoui A, Messaoud S, Bouaafia S, Ammari AC, Khriji L, Machhout M (2021) PYNQ FPGA hardware implementation of lenet-5-based traffic sign recognition application. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 1004–1009
Mosavi MR, Kaveh M, Khishe M, Aghababaie M (2018) Design and implementation a sonar data set classifier using multi-layer perceptron neural network trained by elephant herding optimization
Mosavi MR, Kaveh M, Khishe M, Aghababaie M Design and implementation a sonar data set classifier using multi-layer perceptron neural
Khishe M, Mosavi MR, Moridi A (2018) Chaotic fractal walk trainer for sonar data set classification using multi-layer perceptron neural network and its hardware implementation. Appl Acoust 137:121–139
Article Google Scholar
Kaveh M, Khishe M, Mosavi MR (2019) Design and implementation of a neighborhood search biogeography-based optimization trainer for classifying sonar dataset using multi-layer perceptron neural network. Analog Integr Circ Sig Process 100(2):405–428
Article Google Scholar
Zhang N, Wei X, Chen H et al (2021) FPGA implementation for CNN-based optical remote sensing object detection. Electronics 10(3):282
Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) Fast CU partition-based machine learning approach for reducing HEVC complexity. J Real-Time Image Proc 17(1):185–196
Article Google Scholar
I Bouaafia S, Khemiri R, Sayadi FE (2021) Rate-distortion performance comparison: VVC vs. HEVC. In: 2021 18th international multi-conference on systems, signals & devices (SSD). IEEE, pp 440–444
Bouaafia S, Khemiri R, Messaoud S, Sayadi FE (2021) Complexity analysis of new future video coding (FVC) standard technology. Int J Digital Multim Broadcast
Li WC, Wang CC, Huang KN (2018) Data mining for fast high efficiency video coding using decision tree. Int J Trend Res Dev 5(1):360–365
Bouaafia S, Khemiri R, Sayadi FE, Atri M (2020) SVM-based inter prediction mode decision for HEVC. In: 2020 17th International multi-conference on systems, signals & devices (SSD). IEEE, pp 12–16
Jung SH, Park HW (2015) A fast mode decision method in HEVC using adaptive ordering of modes. IEEE Trans Circ Syst Video Technol 26(10):1846–1858
Hamout H, Elyousfi A (2019) Fast 3D-HEVC PU size decision algorithm for depth map intra-video coding. J Real Time Image Process 1–15
Kim IK, Min J, Lee TW, Han J, Park JH (2012) Block partitioning structure in the HEVC standard. ’IEEE Trans Circ Syst Video Technol 22:1697–1706
Bouaafia S, Khemiri R, Sayadi FE, Atri M, Liouane NA (2020) Deep CNN-LSTM Framework for fast video coding. Int Conf Image Sig Process Springer 205–212
Bouaafia S, Khemiri R, Maraoui A, Sayadi FE (2021) CNN-LSTM learning approach-based complexity reduction for high-efficiency video coding standard. Sci Program
Pandey SK, Janghel RR (2019) Recent deep learning techniques, challenges and its applications for medical healthcare system: A review. Neural Process Lett 50(2):1907–1935
Article Google Scholar
Xilinx (2018) PYNQ: python productivity for zynq [Online]. http://www.pynq.io
Xilinx Vivado Design Suite (2017) User guide high-level synthesis. UG902 (v2017.2) April 5, 2017
Skrimponis P, Pissadakis E, Alachiotis N, Pnevmatikatos D (2020) Accelerating binarized convolutional neural networks with dynamic partial reconfiguration on disaggregated FPGAs. In: Parallel computing: technology trends. IOS Press, pp 691–700
Gan F, Zuyi H, Song C, Feng W (2017) Energy-efficient and high-throughput FPGA-based accelerator for convoutional neual networks. In: IEEE international conference on solid-state and integrated circuit technology. IEEE, pp 624–626
Liu B, Zou D, Feng L, Feng S, Fu P, Li J (2019) An fpga-based cnn accelerator integrating depthwise separable convolution. Electronics 8(3):281
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Electronics and Microelectronics, Faculty of Sciences of Monastir, University of Monastir, Monastir, Tunisia
Soulef Bouaafia, Randa Khemiri & Seifeddine Messaoud
Higher Institute of Computer Science and Multimedia of Gabes, University of Gabes, Gabès, Tunisia
Randa Khemiri
Laboratory of Networked Objects Control and Communication Systems, National Engineering School of Sousse, University of Sousse, Sousse, Tunisia
Fatma Ezahra Sayadi

Authors

Soulef Bouaafia
View author publications
You can also search for this author in PubMed Google Scholar
Randa Khemiri
View author publications
You can also search for this author in PubMed Google Scholar
Seifeddine Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Fatma Ezahra Sayadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soulef Bouaafia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouaafia, S., Khemiri, R., Messaoud, S. et al. Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC. Neural Process Lett 54, 3283–3301 (2022). https://doi.org/10.1007/s11063-022-10765-1

Download citation

Accepted: 04 February 2022
Published: 25 February 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11063-022-10765-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC

Abstract

Access this article

Similar content being viewed by others

Video Restoration Using Convolutional Neural Networks for Low-Level FPGAs

Fpga-based SoC design for real-time facial point detection using deep convolutional neural networks with dynamic partial reconfiguration

Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep CNN Co-design for HEVC CU Partition Prediction on FPGA–SoC

Abstract

Access this article

Similar content being viewed by others

Video Restoration Using Convolutional Neural Networks for Low-Level FPGAs

Fpga-based SoC design for real-time facial point detection using deep convolutional neural networks with dynamic partial reconfiguration

Implementation of deep neural networks on FPGA-CPU platform using Xilinx SDSOC

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation