Skip to main content
Log in

SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

High Efficiency Video Coding (HEVC) offers superior compression rates, but its adoption introduces increased coding complexity due to its reliance on a recursive quad-tree for partitioning frames into varying block sizes. This quad-tree process is a central feature in upcoming video coding standards. Our paper presents a novel framework, SVG-CNN, which integrates three shallow Convolutional Neural Networks (CNNs) inspired by VGGNet. Each CNN is specifically designed for individual quad-tree levels to predict the Code Unit (CU) partition in HEVC, leading to reduced intra-frame coding time. SVG-CNN has an inherent capability for early terminations, leveraging sequential CNN feeding based on quad-tree level probabilities. This provides a mechanism to halt processes when further refinement is seemed unlikely. Enhancing the model's efficacy, we have crafted three specialized datasets, each focusing on distinct quad-tree levels and quantization parameter (QP) contexts. This allows each CNN within our framework to undergo targeted training, establishing a cutting-edge training methodology. Our study shows that performance, in terms of accuracy and F1 metrics, is highly dependent on QP settings, with lower QPs yielding better results, and higher QPs diminishing performance due to potential loss of critical features. To enhance our model, we tackled hyperparameter selection and CU split threshold determination for HEVC prediction. We utilized Grid Search Cross-Validation for the former and assessed multiple thresholds across selected videos for the latter. The model has a moderate complexity with over 328,000 parameters across 18 layers, which ensures memory efficiency. It boasts a swift prediction time of 0.05 ms and reduces HEVC encoding time by 61.64%, while slightly improving the bitrate-distortion performance by -0.24% BDBR, indicating better compression without notable PSNR loss. Significantly, our approach outperforms other CNN-based quad-tree partitioning methods that reduce HEVC coding complexity but sacrifice compression performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings are originally from [21]. The authors make it fully available for researchers without asking permission, since we cite their paper. The authors in [21] make their dataset available in the GitHub repository, https://github.com/HEVC-Projects/CPH. In the GitHub there is a link to the Dropbox repository that was used to download the data, https://www.dropbox.com/sh/eo5dc3h27t41etl/AAADvFKoc5nYcZw6KO9XNycZa?dl=0

References

  1. Adnan M, Alarood AAS, Uddin MI, Rehman I (2022) Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models. PeerJ Comput Sci 8(February 2022):e803. https://doi.org/10.7717/peerj-cs.803

  2. Alghamdi T, Alaghband G (2022) Facial Expressions Based Automatic Pain Assessment System. Appl Sci 12 13(January 2022):6423. https://doi.org/10.3390/app12136423

  3. Bjontegaard G (2001) Calculation of average PSNR differences between RD curves [WWW Document]. Proceedings of the VCEG-M33. https://www.itu.int/wftp3/av-arch/video-site/0104_Aus/VCEG-M33.doc (accessed 9.3.22)

  4. Corrêa G, Assunção P, Agostini L, Cruz LADS (2016) Complexity-Aware High Efficiency Video Coding. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-25778-5

    Book  Google Scholar 

  5. Dang-Nguyen D-T, Conotter CP, Boato G (2015) RAISE: A raw images dataset for digital image forensics. Presented at the Proceedings of the 6th ACM Multimedia Systems Conference. pp 219–224

  6. Fan J, Song L (2023) Fast intra-frame prediction algorithm for HEVC based on neural networks and adaptive threshold. In: Proceedings of the 2022 6th International Conference on Video and Image Processing (ICVIP ’22). Association for Computing Machinery, New York, NY, USA, pp 127–134. https://doi.org/10.1145/3579109.3579131

  7. Feng A, Gao C, Li L, Liu D, Wu F (2021) Cnn-Based Depth Map Prediction for Fast Block Partitioning in HEVC Intra Coding, in: 2021 IEEE International Conference on Multimedia and Expo (ICME). Presented at the 2021 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428069

  8. Hssayni EH, Joudar N-E, Ettaouil M (2022) A deep learning framework for time series classification using normal cloud representation and convolutional neural network optimization. Comput Intell 38(6):2056–2074. https://doi.org/10.1111/coin.12556

  9. Hssayni EH, Joudar N-E, Ettaouil M (2022) An adaptive Drop method for deep neural networks regularization: Estimation of DropConnect hyperparameter using generalization gap. Knowl-Based Syst 253 (October 2022):109567. https://doi.org/10.1016/j.knosys.2022.109567

  10. Hu Q, Shi Z, Zhang X, Gao Z (2015) Early SKIP mode decision based on Bayesian model for HEVC. 1–4. https://doi.org/10.1109/VCIP.2015.7457828

  11. Hu Q, Zhang X, Shi Z, Gao Z (2016) Neyman-pearson-based early mode decision for HEVC encoding. IEEE Trans Multimed 18(3):379–391. https://doi.org/10.1109/TMM.2015.2512799

  12. Kim H, Park R (2016) Fast CU partitioning algorithm for HEVC using an online-learning-based bayesian decision rule. IEEE Trans Circ Syst Video Technol 26(1):130–138. https://doi.org/10.1109/TCSVT.2015.2444672

  13. Kim M, Ling N, Song L, Gu Z (2014) Fast skip mode decision with rate-distortion optimization for High Efficiency Video Coding. Presented at the 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). pp 1–6. https://doi.org/10.1109/ICMEW.2014.6890721

  14. Kuanar S, Rao KR, Bilas M, Bredow J (2019) Adaptive CU mode selection in HEVC intra prediction: A deep learning approach. Circ Syst Signal Process 38(11):5081–5102. https://doi.org/10.1007/s00034-019-01110-4

  15. Lee J, Kim S, Lim K, Lee S (2015) A fast CU size decision algorithm for HEVC. IEEE Trans Circ Syst Video Technol 25(3):411–421. https://doi.org/10.1109/TCSVT.2014.2339612

  16. Li G, Liang S, Nie S, Liu W, Yang Z (2021) Deep neural network-based generalized sidelobe canceller for dual-channel far-field speech recognition. Neural Netw 141(September 2021):225–237. https://doi.org/10.1016/j.neunet.2021.04.017

  17. Rosewarne C, Naccari M, Bross B, Sharman K, Sullivan G (2015) High Efficiency Video Coding (HEVC) Test model 16 (HM16) improved encoder description update 3. JCT-VC, Warsaw - Poland

  18. Simonyan K, Andrew Z (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. Presented at the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/ARXIV.1409.1556

  19. Sullivan GJ, Ohm J, Woo-Jin H, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) standard. Circ Syst Video Technol IEEE Trans 22(12):1649–1668. https://doi.org/10.1109/TCSVT.2012.2221191

  20. Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90. https://doi.org/10.1109/79.733497

  21. Xu M, Li T, Wang Z, Deng X, Yang R, Guan Z (2018) Reducing complexity of HEVC: A deep learning approach. IEEE Trans Image Process 27:(10):5044–5059. https://doi.org/10.1109/TIP.2018.2847035

  22. Zhang Y, Wang G, Tian R, Xu M, Kuo CCJ (2019) Texture-Classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC. Presented at the 2019 Data Compression Conference (DCC). pp 241–249. https://doi.org/10.1109/DCC.2019.00032

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iris Linck.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Linck, I., Gómez, A.T. & Alaghband, G. SVG-CNN: A shallow CNN based on VGGNet applied to intra prediction partition block in HEVC. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18412-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18412-8

Keywords

Navigation