SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC

Linck, Iris; Gómez, Arthur Tórgo; Alaghband, Gita

doi:10.1007/s11042-024-18412-8

SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC

Published: 14 February 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

115 Accesses
Explore all metrics

Abstract

High Efficiency Video Coding (HEVC) offers superior compression rates, but its adoption introduces increased coding complexity due to its reliance on a recursive quad-tree for partitioning frames into varying block sizes. This quad-tree process is a central feature in upcoming video coding standards. Our paper presents a novel framework, SVG-CNN, which integrates three shallow Convolutional Neural Networks (CNNs) inspired by VGGNet. Each CNN is specifically designed for individual quad-tree levels to predict the Code Unit (CU) partition in HEVC, leading to reduced intra-frame coding time. SVG-CNN has an inherent capability for early terminations, leveraging sequential CNN feeding based on quad-tree level probabilities. This provides a mechanism to halt processes when further refinement is seemed unlikely. Enhancing the model's efficacy, we have crafted three specialized datasets, each focusing on distinct quad-tree levels and quantization parameter (QP) contexts. This allows each CNN within our framework to undergo targeted training, establishing a cutting-edge training methodology. Our study shows that performance, in terms of accuracy and F1 metrics, is highly dependent on QP settings, with lower QPs yielding better results, and higher QPs diminishing performance due to potential loss of critical features. To enhance our model, we tackled hyperparameter selection and CU split threshold determination for HEVC prediction. We utilized Grid Search Cross-Validation for the former and assessed multiple thresholds across selected videos for the latter. The model has a moderate complexity with over 328,000 parameters across 18 layers, which ensures memory efficiency. It boasts a swift prediction time of 0.05 ms and reduces HEVC encoding time by 61.64%, while slightly improving the bitrate-distortion performance by -0.24% BDBR, indicating better compression without notable PSNR loss. Significantly, our approach outperforms other CNN-based quad-tree partitioning methods that reduce HEVC coding complexity but sacrifice compression performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning-Based approaches to reduce HEVC intra coding unit partition decision complexity

Article 05 November 2021

Fast QTMT decision tree for Versatile Video Coding based on deep neural network

Article 09 August 2022

Fast QTBT Partition Algorithm for JVET Intra Coding Based on CNN

Data availability

The data that support the findings are originally from [21]. The authors make it fully available for researchers without asking permission, since we cite their paper. The authors in [21] make their dataset available in the GitHub repository, https://github.com/HEVC-Projects/CPH. In the GitHub there is a link to the Dropbox repository that was used to download the data, https://www.dropbox.com/sh/eo5dc3h27t41etl/AAADvFKoc5nYcZw6KO9XNycZa?dl=0

References

Adnan M, Alarood AAS, Uddin MI, Rehman I (2022) Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models. PeerJ Comput Sci 8(February 2022):e803. https://doi.org/10.7717/peerj-cs.803
Alghamdi T, Alaghband G (2022) Facial Expressions Based Automatic Pain Assessment System. Appl Sci 12 13(January 2022):6423. https://doi.org/10.3390/app12136423
Bjontegaard G (2001) Calculation of average PSNR differences between RD curves [WWW Document]. Proceedings of the VCEG-M33. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc (accessed 9.3.22)
Corrêa G, Assunção P, Agostini L, Cruz LADS (2016) Complexity-Aware High Efficiency Video Coding. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25778-5
Book Google Scholar
Dang-Nguyen D-T, Conotter CP, Boato G (2015) RAISE: A raw images dataset for digital image forensics. Presented at the Proceedings of the 6th ACM Multimedia Systems Conference. pp 219–224
Fan J, Song L (2023) Fast intra-frame prediction algorithm for HEVC based on neural networks and adaptive threshold. In: Proceedings of the 2022 6th International Conference on Video and Image Processing (ICVIP ’22). Association for Computing Machinery, New York, NY, USA, pp 127–134. https://doi.org/10.1145/3579109.3579131
Feng A, Gao C, Li L, Liu D, Wu F (2021) Cnn-Based Depth Map Prediction for Fast Block Partitioning in HEVC Intra Coding, in: 2021 IEEE International Conference on Multimedia and Expo (ICME). Presented at the 2021 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428069
Hssayni EH, Joudar N-E, Ettaouil M (2022) A deep learning framework for time series classification using normal cloud representation and convolutional neural network optimization. Comput Intell 38(6):2056–2074. https://doi.org/10.1111/coin.12556
Hssayni EH, Joudar N-E, Ettaouil M (2022) An adaptive Drop method for deep neural networks regularization: Estimation of DropConnect hyperparameter using generalization gap. Knowl-Based Syst 253 (October 2022):109567. https://doi.org/10.1016/j.knosys.2022.109567
Hu Q, Shi Z, Zhang X, Gao Z (2015) Early SKIP mode decision based on Bayesian model for HEVC. 1–4. https://doi.org/10.1109/VCIP.2015.7457828
Hu Q, Zhang X, Shi Z, Gao Z (2016) Neyman-pearson-based early mode decision for HEVC encoding. IEEE Trans Multimed 18(3):379–391. https://doi.org/10.1109/TMM.2015.2512799
Kim H, Park R (2016) Fast CU partitioning algorithm for HEVC using an online-learning-based bayesian decision rule. IEEE Trans Circ Syst Video Technol 26(1):130–138. https://doi.org/10.1109/TCSVT.2015.2444672
Kim M, Ling N, Song L, Gu Z (2014) Fast skip mode decision with rate-distortion optimization for High Efficiency Video Coding. Presented at the 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). pp 1–6. https://doi.org/10.1109/ICMEW.2014.6890721
Kuanar S, Rao KR, Bilas M, Bredow J (2019) Adaptive CU mode selection in HEVC intra prediction: A deep learning approach. Circ Syst Signal Process 38(11):5081–5102. https://doi.org/10.1007/s00034-019-01110-4
Lee J, Kim S, Lim K, Lee S (2015) A fast CU size decision algorithm for HEVC. IEEE Trans Circ Syst Video Technol 25(3):411–421. https://doi.org/10.1109/TCSVT.2014.2339612
Li G, Liang S, Nie S, Liu W, Yang Z (2021) Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition. Neural Netw 141(September 2021):225–237. https://doi.org/10.1016/j.neunet.2021.04.017
Rosewarne C, Naccari M, Bross B, Sharman K, Sullivan G (2015) High Efficiency Video Coding (HEVC) Test model 16 (HM16) improved encoder description update 3. JCT-VC, Warsaw - Poland
Simonyan K, Andrew Z (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Presented at the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/ARXIV.1409.1556
Sullivan GJ, Ohm J, Woo-Jin H, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) standard. Circ Syst Video Technol IEEE Trans 22(12):1649–1668. https://doi.org/10.1109/TCSVT.2012.2221191
Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90. https://doi.org/10.1109/79.733497
Xu M, Li T, Wang Z, Deng X, Yang R, Guan Z (2018) Reducing complexity of HEVC: A deep learning approach. IEEE Trans Image Process 27:(10):5044–5059. https://doi.org/10.1109/TIP.2018.2847035
Zhang Y, Wang G, Tian R, Xu M, Kuo CCJ (2019) Texture-Classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC. Presented at the 2019 Data Compression Conference (DCC). pp 241–249. https://doi.org/10.1109/DCC.2019.00032

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Colorado Denver, Denver, CO, USA
Iris Linck & Gita Alaghband
National Council for Scientific and Technological Development - CNPq, Brasilia, DF, Brazil
Arthur Tórgo Gómez

Authors

Iris Linck
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Tórgo Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Gita Alaghband
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iris Linck.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Linck, I., Gómez, A.T. & Alaghband, G. SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18412-8

Download citation

Received: 02 June 2023
Revised: 05 November 2023
Accepted: 22 January 2024
Published: 14 February 2024
DOI: https://doi.org/10.1007/s11042-024-18412-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based approaches to reduce HEVC intra coding unit partition decision complexity

Fast QTMT decision tree for Versatile Video Coding based on deep neural network

Fast QTBT Partition Algorithm for JVET Intra Coding Based on CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC

Abstract

Access this article

Similar content being viewed by others

Machine Learning-Based approaches to reduce HEVC intra coding unit partition decision complexity

Fast QTMT decision tree for Versatile Video Coding based on deep neural network

Fast QTBT Partition Algorithm for JVET Intra Coding Based on CNN

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation