Abstract
Video compression plays an essential role in many multimedia applications, and compression efficiency is highly related to video content. In this paper, based on video content characteristics, we propose a content adaptive downsampling scheme to improve video coding efficiency at a low bitrate. Specifically, we extract content-aware spatial-temporal features including normalized Spatial Information (SI), normalized Temporal Information (TI), and spatial masking as the perceptual features. Then, Support Vector Machine (SVM) is adopted to predict the optimal coding configuration for the video, i.e., direct encoding or downsample encoding. Experimental results show that the proposed method achieves considerable bitrate savings for video sequences at different resolutions with low computational complexity.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Li Z, Duanmu Z, Liu W, Wang Z(2019) AVC, HEVC, VP9, AVS2 or AV1? — A Comparative Study of State-of-the-Art Video Encoders on 4K Videos. In: Image Analysis and Recognition. pp 162–173
x264. [Online]. Accessed 15 March 2022. https://github.com/mirror/x264
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A (2003) Overview of the H.264/AVC video coding standard. IEEE Trans Circuits Syst Video Technol 13(7):560–576
x265. [Online]. Accessed 15 March 2022. https://github.com/videolan/x265
Sullivan GJ, Ohm J-R, Han W-J, Wiegand T (2012) Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Trans Circuits Syst Video Technol 22(12):1649–1668
libvpx. [Online]. Accessed 15 March 2022. https://github.com/webmproject/libvpx
Guo L, De Cock J, Aaron A (2018) Compression Performance Comparison of x264, x265, libvpx and aomenc for On-Demand Adaptive Streaming Applications. In: 2018 Picture Coding Symposium (PCS). pp 26–30
Bross B, Chen J, Ohm J-R, Sullivan GJ, Wang Y-K (2021) Developments in International Video Coding Standardization After AVC, With an Overview of Versatile Video Coding (VVC). Proc IEEE. 109(9):1463–1493
Mercat A, Mkinen A, Sainio J, Lemmetti A, Viitanen M, Vanne J (2021) Comparative Rate-Distortion-Complexity Analysis of VVC and HEVC Video Codecs. IEEE Access. 9:67813–67828
Saldanha M, Sanchez G, Marcon C, Agostini L (2021) Performance analysis of vvc intra coding. J Vis Commun Image Represent 79:103202. https://doi.org/10.1016/j.r.2021.103202
Zhang X, Wu X (2008) Can Lower Resolution Be Better? In: Data Compression Conference (dcc 2008). pp 302–311
He X, Fu W, Lin H, Li X, Peng X (2016) A Video Coding System with Spatial-Temporal Down-/Up-Sampling and Super-Resolution Reconstruction. In: 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN). pp 236–239
Zhang F, Afonso M, Bull DR (2021) Vistra2: Video coding using spatial resolution and effective bit depth adaptation. Signal Processing: Image Communication. 97:116355
Aaron A, Li Z, Manohara M, Cock JD, Ronca D (2015) Per-title encode optimization. Accessed 8 May 2021. [Online]. http://techblog.netflix.com/2015/12/per-title-encode-optimization.html
De Cock J, Li Z, Manohara M, Aaron A (2016) Complexity-based consistent-quality encoding in the cloud. In: 2016 IEEE International Conference on Image Processing (ICIP). pp 1484–1488
Katsavounidis I (2018) Dynamic optimizer a perceptual video encoding optimization framework. Accessed 8 May 2021. [Online]. https://netflixtechblog.com/dynamic-optimizer-a-perceptual-video-encoding-optimization-framework-e19f1e3a277f
Amirpour H, Timmerer C, Ghanbari M (2021) PSTR: Per-Title Encoding Using Spatio-Temporal Resolutions. In: 2021 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6
Katsenou AV, Afonso M, Agrafiotis D, Bull DR (2016) Predicting video rate-distortion curves using textural features. In: 2016 Picture Coding Symposium (PCS). pp 1–5
Katsenou AV, Afonso M, Bull DR (2022) Study of compression statistics and prediction of rate-distortion curves for video texture. Signal Process Image Commun. 101:116551
Katsenou AV, Sole J, Bull DR (2019) Content-gnostic Bitrate Ladder Prediction for Adaptive Video Streaming. In: 2019 Picture Coding Symposium (PCS). pp 1–5
Katsenou AV, Sole J, Bull DR (2021) Efficient Bitrate Ladder Construction for Content-Optimized Adaptive Video Streaming. IEEE Open J. Signal Process. 2:496–511
Lin JY, Liu T-J, Wu EH-H, Kuo C-CJ (2014) A fusion-based video quality assessment (fvqa) index. In: Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific. pp 1–5
Woei-Tan Loh DBLB (2018) An error-based video quality assessment method with temporal information. Multimed Tools Appl 77. https://doi.org/10.1007/s11042-018-6107-1
Fan Q, Luo W, Xia Y, Li G, He D (2019) Metrics and methods of video quality assessment: a brief review. Multimed Tools Appl 78. https://doi.org/10.1007/s11042-017-4848-x
Yasamin Fazliani SS, Andrade Ernesto (2022) Neural network solution for a real-time no-reference video quality assessment of H.264/AVC video bitstreams. Multimed Tools Appl 81:100. https://doi.org/10.1007/s11042-021-10654-0
Netflix (2018) VMAF: Perceptual Video Quality Assessment Base on Multi-Method Fusion. Accessed 18 March 2022. [Online]. https://github.com/Netflix/vmaf
Wang H, Katsavounidis I, Zhou J, Park J, Lei S, Zhou X, Pun M-O, Jin X, Wang R, Wang X, Zhang Y, Huang J, Kwong S, Kuo C-CJ (2017) VideoSet: A large-scale compressed video quality dataset based on JND measurement. J Vis Commun Image Represent 46:292–302
G.Bjontegaard (2001) Calculation of average PSNR differences between RD curves. ITU-T Q.6/SG16 VCEG 13th meeting
Nguyen V-A, Tan Y-P, Lin W (2008) Adaptive downsampling/upsampling for better video compression at low bit rate. In: 2008 IEEE International Symposium on Circuits and Systems (ISCAS). pp 1624–1627
Huang C-R, Huang W-Y, Liao Y-S, Lee C-C, Yeh Y-W (2022) A Content-Adaptive Resizing Framework for Boosting Computation Speed of Background Modeling Methods. IEEE Trans Syst Man Cybern 52(2):1192–1204
Wang R-J, Huang C-W, Chang P-C (2014) Adaptive Downsampling Video Coding With Spatially Scalable Rate-Distortion Modeling. IEEE Trans Circuits Syst Video Technol 24(11):1957–1968
Dong J, Ye Y (2014) Adaptive Downsampling for High-Definition Video Coding. IEEE Trans Circuits Syst Video Technol 24(3):480–488
Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90
Afonso M, Zhang F, Katsenou A, Agrafiotis D, Bull D (2017) Low complexity video coding based on spatial resolution adaptation. In: 2017 IEEE International Conference on Image Processing (ICIP). pp 3011–3015
Afonso M, Zhang F, Bull DR (2019) Video Compression Based on Spatio-Temporal Resolution Adaptation. IEEE Trans Circuits Syst Video Technol 29(1):275–280
Wang Y, Zhu K, Wu J, Zhu Y (2017) Content aware video quality prediction model for HEVC encoded bitstream. Multimed. Tools Appl 76:19191–19209
Ling S, Baveye Y, Callet PL, Skinner J, Katsavounidis I (2020) Towards Perceptually-Optimized Compression Of User Generated Content (UGC): Prediction Of UGC Rate-Distortion Category. In: 2020 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6
Meng S, Li Y, Liao Y, Li J, Wang S (2020) Learning to encode user-generated short videos with lower bitrate and the same perceptual quality. In: 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). pp 383–386
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning Spatiotemporal Features with 3D Convolutional Networks. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp 4489–4497
Li J, Liu X, Xiao J, Li H, Wang S, Liu L (2019) Dynamic spatio-temporal feature learning via graph convolution in 3d convolutional networks. In: 2019 International Conference on Data Mining Workshops (ICDMW). pp 646–652
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K (2015) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp 2625–2634
Zvezdakov S, Kondranin D, Vatolin D (2021) Machine-Learning-Based Method for Content-Adaptive Video Encoding. In: 2021 Picture Coding Symposium (PCS). pp 1–5
Qin S, Yang C, An P (2022) Adaptive rescaling for video coding optimization. In: 2022 7th International Conference on Signal and Image Processing (ICSIP). IEEE, pp 601–605
FFmpeg. [Online]. Accessed 15 March 2022. https://FFmpeg.org/documentation.html
al LA (2005) X264-A Free H264/AVC Encoder. [Online]. Accessed 15 March 2022. http://www.videolan.org/developers/x264.html
Zhang X, Yang C, Wang H, Xu W, Kuo C-CJ (2020) Satisfied-User-Ratio Modeling for Compressed Video. IEEE Trans Image Process 29:3777–3789
(ITU), ITU (2008) Subjective Video Quality Assessment Methods for Multimedia Applications. ITU T Recommendation P.910
Fenimore C, Libert J, Wolf S (2000) Perceptual Effects of Noise in Digital Video Compression. p 109
Antong Y, Xiuhua J, Xiaohua L (2015) Quality assessment of videos compressed by HEVC based on video content complexity. In: 2015 IEEE International Conference on Computer and Communications (ICCC). pp 425–429
Hu S, Jin L, Wang H, Zhang Y, Kwong S, Kuo C-CJ (2015) Compressed image quality metric based on perceptually weighted distortion. IEEE Trans Image Process 24(12):5594–5608
Hu S, Jin L, Wang H, Zhang Y, Kwong S, Kuo C-CJ (2017) Objective Video Quality Assessment Based on Perceptually Weighted Mean Squared Error. IEEE Trans Circuits Syst Video Technol 27(9):1844–1855
Acknowledgements
This work was supported in part by the NSFC under Grant 61901252, 62171002, 62071287, 62020106011, and Science and Technology Commission of Shanghai Municipality under Grant 22ZR1424300, 20DZ2290100.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
All authors declare that there are no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, S., Yang, C. & An, P. Content adaptive downsampling for low bitrate video coding. Multimed Tools Appl 83, 26547–26563 (2024). https://doi.org/10.1007/s11042-023-16532-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16532-1