Abstract
High Efficiency Video Coding (HEVC) offers superior compression rates, but its adoption introduces increased coding complexity due to its reliance on a recursive quad-tree for partitioning frames into varying block sizes. This quad-tree process is a central feature in upcoming video coding standards. Our paper presents a novel framework, SVG-CNN, which integrates three shallow Convolutional Neural Networks (CNNs) inspired by VGGNet. Each CNN is specifically designed for individual quad-tree levels to predict the Code Unit (CU) partition in HEVC, leading to reduced intra-frame coding time. SVG-CNN has an inherent capability for early terminations, leveraging sequential CNN feeding based on quad-tree level probabilities. This provides a mechanism to halt processes when further refinement is seemed unlikely. Enhancing the model's efficacy, we have crafted three specialized datasets, each focusing on distinct quad-tree levels and quantization parameter (QP) contexts. This allows each CNN within our framework to undergo targeted training, establishing a cutting-edge training methodology. Our study shows that performance, in terms of accuracy and F1 metrics, is highly dependent on QP settings, with lower QPs yielding better results, and higher QPs diminishing performance due to potential loss of critical features. To enhance our model, we tackled hyperparameter selection and CU split threshold determination for HEVC prediction. We utilized Grid Search Cross-Validation for the former and assessed multiple thresholds across selected videos for the latter. The model has a moderate complexity with over 328,000 parameters across 18 layers, which ensures memory efficiency. It boasts a swift prediction time of 0.05 ms and reduces HEVC encoding time by 61.64%, while slightly improving the bitrate-distortion performance by -0.24% BDBR, indicating better compression without notable PSNR loss. Significantly, our approach outperforms other CNN-based quad-tree partitioning methods that reduce HEVC coding complexity but sacrifice compression performance.
Similar content being viewed by others
Data availability
The data that support the findings are originally from [21]. The authors make it fully available for researchers without asking permission, since we cite their paper. The authors in [21] make their dataset available in the GitHub repository, https://github.com/HEVC-Projects/CPH. In the GitHub there is a link to the Dropbox repository that was used to download the data, https://www.dropbox.com/sh/eo5dc3h27t41etl/AAADvFKoc5nYcZw6KO9XNycZa?dl=0
References
Adnan M, Alarood AAS, Uddin MI, Rehman I (2022) Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models. PeerJ Comput Sci 8(February 2022):e803. https://doi.org/10.7717/peerj-cs.803
Alghamdi T, Alaghband G (2022) Facial Expressions Based Automatic Pain Assessment System. Appl Sci 12 13(January 2022):6423. https://doi.org/10.3390/app12136423
Bjontegaard G (2001) Calculation of average PSNR differences between RD curves [WWW Document]. Proceedings of the VCEG-M33. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc (accessed 9.3.22)
Corrêa G, Assunção P, Agostini L, Cruz LADS (2016) Complexity-Aware High Efficiency Video Coding. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25778-5
Dang-Nguyen D-T, Conotter CP, Boato G (2015) RAISE: A raw images dataset for digital image forensics. Presented at the Proceedings of the 6th ACM Multimedia Systems Conference. pp 219–224
Fan J, Song L (2023) Fast intra-frame prediction algorithm for HEVC based on neural networks and adaptive threshold. In: Proceedings of the 2022 6th International Conference on Video and Image Processing (ICVIP ’22). Association for Computing Machinery, New York, NY, USA, pp 127–134. https://doi.org/10.1145/3579109.3579131
Feng A, Gao C, Li L, Liu D, Wu F (2021) Cnn-Based Depth Map Prediction for Fast Block Partitioning in HEVC Intra Coding, in: 2021 IEEE International Conference on Multimedia and Expo (ICME). Presented at the 2021 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428069
Hssayni EH, Joudar N-E, Ettaouil M (2022) A deep learning framework for time series classification using normal cloud representation and convolutional neural network optimization. Comput Intell 38(6):2056–2074. https://doi.org/10.1111/coin.12556
Hssayni EH, Joudar N-E, Ettaouil M (2022) An adaptive Drop method for deep neural networks regularization: Estimation of DropConnect hyperparameter using generalization gap. Knowl-Based Syst 253 (October 2022):109567. https://doi.org/10.1016/j.knosys.2022.109567
Hu Q, Shi Z, Zhang X, Gao Z (2015) Early SKIP mode decision based on Bayesian model for HEVC. 1–4. https://doi.org/10.1109/VCIP.2015.7457828
Hu Q, Zhang X, Shi Z, Gao Z (2016) Neyman-pearson-based early mode decision for HEVC encoding. IEEE Trans Multimed 18(3):379–391. https://doi.org/10.1109/TMM.2015.2512799
Kim H, Park R (2016) Fast CU partitioning algorithm for HEVC using an online-learning-based bayesian decision rule. IEEE Trans Circ Syst Video Technol 26(1):130–138. https://doi.org/10.1109/TCSVT.2015.2444672
Kim M, Ling N, Song L, Gu Z (2014) Fast skip mode decision with rate-distortion optimization for High Efficiency Video Coding. Presented at the 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). pp 1–6. https://doi.org/10.1109/ICMEW.2014.6890721
Kuanar S, Rao KR, Bilas M, Bredow J (2019) Adaptive CU mode selection in HEVC intra prediction: A deep learning approach. Circ Syst Signal Process 38(11):5081–5102. https://doi.org/10.1007/s00034-019-01110-4
Lee J, Kim S, Lim K, Lee S (2015) A fast CU size decision algorithm for HEVC. IEEE Trans Circ Syst Video Technol 25(3):411–421. https://doi.org/10.1109/TCSVT.2014.2339612
Li G, Liang S, Nie S, Liu W, Yang Z (2021) Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition. Neural Netw 141(September 2021):225–237. https://doi.org/10.1016/j.neunet.2021.04.017
Rosewarne C, Naccari M, Bross B, Sharman K, Sullivan G (2015) High Efficiency Video Coding (HEVC) Test model 16 (HM16) improved encoder description update 3. JCT-VC, Warsaw - Poland
Simonyan K, Andrew Z (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Presented at the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/ARXIV.1409.1556
Sullivan GJ, Ohm J, Woo-Jin H, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) standard. Circ Syst Video Technol IEEE Trans 22(12):1649–1668. https://doi.org/10.1109/TCSVT.2012.2221191
Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90. https://doi.org/10.1109/79.733497
Xu M, Li T, Wang Z, Deng X, Yang R, Guan Z (2018) Reducing complexity of HEVC: A deep learning approach. IEEE Trans Image Process 27:(10):5044–5059. https://doi.org/10.1109/TIP.2018.2847035
Zhang Y, Wang G, Tian R, Xu M, Kuo CCJ (2019) Texture-Classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC. Presented at the 2019 Data Compression Conference (DCC). pp 241–249. https://doi.org/10.1109/DCC.2019.00032
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Linck, I., Gómez, A.T. & Alaghband, G. SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18412-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-18412-8