Blind consumer video quality assessment with spatial-temporal perception and fusion

Niu, Yuzhen; Zheng, Yuming; Wang, Zhenlong; Zhong, Mengzhen; Zhao, Tiesong

doi:10.1007/s11042-023-16242-8

Blind consumer video quality assessment with spatial-temporal perception and fusion

Published: 25 July 2023

Volume 83, pages 18969–18986, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yuzhen Niu^1,2,
Yuming Zheng¹,
Zhenlong Wang¹,
Mengzhen Zhong¹ &
…
Tiesong Zhao²

144 Accesses
Explore all metrics

Abstract

Blind quality assessment for user-generated content (UGC) or consumer videos is challenging in computer vision. Two open issues are yet to be addressed: how to effectively extract high-dimensional spatial-temporal features of consumer videos and how to appropriately model the relationship between these features and user perceptions within a unified blind video quality assessment (BVQA). To tackle these issues, we propose a novel BVQA model with spatial-temporal perception and fusion. Firstly, we develop two perception modules to extract the perceptual-distortion-related features separately from the spatial and temporal domains. In particular, the temporal-domain features are obtained with a combination of 3D ConvNet and residual frames for their high efficiencies in capturing the motion-specific temporal features. Secondly, we propose a feature fusion module that adaptively combines spatial-temporal features. Finally, we map the fused features onto perceptual quality. Experimental results demonstrate that our model outperforms other advanced methods in conducting subjective video quality prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

HDR-BVQM: High dynamic range blind video quality model

Article 23 May 2021

Blind Video Quality Assessment Using Fusion of Novel Structural Features and Deep Features

No-reference perceived image quality measurement for multiple distortions

Article 17 May 2017

Data Availability

Data availability is not applicable to this article.

The datasets generated during and/or analysed during the current study are available in the following repository:

- KoNViD-1k http://database.mmsp-kn.de/konvid-1k-database.html

- LIVE VQC https://live.ece.utexas.edu/research/LIVEVQC/index.html

- YouTube-UGC https://media.withyoutube.com/

The project corresponding to this manuscript is available through the link https://github.com/790578527/STFN.

References

Argyropoulos S, Raake A, Garcia MN, List P (2011) No-reference video quality assessment for SD and HD H. 264/AVC sequences based on continuous estimates of packet loss visibility. In: International Workshop on Quality of Multimedia Experience (QoMEX), pp. 31–36
Chen Z, Wu D (2011) Prediction of transmission distortion for wireless video communication: Analysis. IEEE Trans Image Process 21(3):1123–1137
Article ADS MathSciNet PubMed Google Scholar
Chen C, Izadi M, Kokaram A (2016) A perceptual quality metric for videos distorted by spatially correlated noise. In: ACM International Conference on Multimedia, pp. 1277–1285
Chen P, Li L, Ma L, Wu J, Shi G (2020) Rirnet: Recurrent-in-recurrent network for video quality assessment. In: ACM International Conference on Multimedia, pp. 834–842
Cho K, van Merrienboer B, Gulcehre C, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP)
Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3(3):201–215
Article CAS PubMed Google Scholar
Dendi SVR, Channappayya SS (2020) No-reference video quality assessment using natural spatiotemporal scene statistics. IEEE Trans Image Process 29:5612–5624
Article ADS Google Scholar
Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Computer Science Review 40(1):100379
Article MathSciNet Google Scholar
Ghadiyaram D, Bovik AC (2017) Perceptual quality prediction on authentically distorted images using a bag of features approach. J Vis 17(1):32
Article PubMed PubMed Central Google Scholar
Group VQE, et al (2000) Final report from the video quality experts group on the validation of objective models of video quality assessment. In: VQEG Meeting
Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3D cnns retrace the history of 2D cnns and imagenet? In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6546–6555
Hermens F, Luksys G, Gerstner W, Herzog MH, Ernst U (2008) Modeling spatial and temporal aspects of visual backward masking, vol. 115, pp. 83–100
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
Hosu, V., Lin, H., Sziranyi, T., Saupe, D.: KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing 29, 4041–4056 (2020)
Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, Li S, Saupe D (2017) The konstanz natural video database (KoNViD-1k). In: International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Keimel C, Habigt J, Klimpke M, Diepold K (2011) Design of no-reference video quality metrics with multiway partial least squares regression. In: International Workshop on Quality of Multimedia Experience (QoMEX), pp. 49–54
Kingma DP, Ba J (2015) Adam: A Method for Stochastic Optimization. In: International Conference on Learning Representations (ICLR)
Korhonen J (2018) Learning-based prediction of packet loss artifact visibility in networked video. In: International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6
Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Trans Image Process 28(12):5923–5938
Article ADS MathSciNet PubMed Google Scholar
Korhonen J, Su Y, You J (2020) Blind natural video quality prediction via statistical temporal features and deep spatial features. In: ACM International Conference on Multimedia, pp. 3311–3319
Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped hdr pictures. IEEE Trans Image Process 26(6):2957–2971
Article ADS MathSciNet PubMed Google Scholar
Larochelle H, Hinton GE (2010) Learning to combine foveal glimpses with a third-order Boltzmann machine. In: NIPS
Li Y, Po L-M, Cheung C-H, Xu X, Feng L, Yuan F, Cheung K-W (2015) No-reference video quality assessment with 3D shearlet transform and convolutional neural networks. IEEE Trans Circuits Syst Video Technol 26(6):1044–1057
Article Google Scholar
Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: ACM International Conference on Multimedia, pp. 2351–2359
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind’’ image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Article ADS Google Scholar
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Article ADS MathSciNet PubMed Google Scholar
Mittal A, Saad MA, Bovik AC (2015) A completely blind video integrity oracle. IEEE Trans Image Process 25(1):289–300
Article ADS MathSciNet PubMed Google Scholar
Murdock BB Jr (1962) The serial position effect of free recall. J Exp Psychol 64(5):482
Article Google Scholar
Niu Y, Liu F (2012) What Makes a Professional Video? A Computational Aesthetics Approach. IEEE Trans Circuits Syst Video Technol 22(7):1037–1049
Article Google Scholar
Pandremmenou K, Shahid M, Kondi LP, Lövström B (2015) A no-reference bitstream-based perceptual model for video quality estimation of videos affected by coding artifacts and packet losses. In: Human Vision and Electronic Imaging XX, vol. 9394, pp. 486–497
Park J, Seshadrinathan K, Lee S, Bovik AC (2012) Video quality pooling adaptive to perceptual distortion severity. IEEE Trans Image Process 22(2):610–620
Article ADS MathSciNet PubMed Google Scholar
Pinson MH, Janowski L, Pépion R, Huynh-Thu Q, Schmidmer C, Corriveau P, Younkin A, Le Callet P, Barkowsky M, Ingram W (2012) The influence of subjects and environment on audiovisual subjective tests: An international study. IEEE Journal of Selected Topics in Signal Processing 6(6):640–651
Article ADS Google Scholar
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3D residual networks. In: IEEE International Conference on Computer Vision, pp. 5533–5541
Rensink RA (2000) The dynamic representation of scenes. Vis Cogn 7(1–3):17–42
Article Google Scholar
Saad MA, Bovik AC, Charrier C (2012) Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE Trans Image Process 21(8):3339–3352
Article ADS MathSciNet PubMed Google Scholar
Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Trans Image Process 23(3):1352–1365
Article ADS MathSciNet PubMed Google Scholar
Seshadrinathan K, Bovik AC (2011) Temporal hysteresis model of time varying subjective video quality. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1153–1156
Siahaan E, Hanjalic A, Redi JA (2018) Semantic-aware blind image quality assessment. Signal Processing: Image Communication 60:237–252
Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR)
Sinno Z, Bovik AC (2018) Large-scale study of perceptual video quality. IEEE Trans Image Process 28(2):612–627
Article ADS MathSciNet Google Scholar
Søgaard J, Forchhammer S, Korhonen J (2015) No-reference video quality assessment using codec analysis. IEEE Trans Circuits Syst Video Technol 25(10):1637–1650
Article Google Scholar
Tao L, Wang X, Yamasaki T (2021) Rethinking motion representation: Residual frames with 3D convnets. IEEE Trans Image Process 30:9231–9244
Article ADS PubMed Google Scholar
Thomee B, Shamma DA, Friedland G, Elizalde B, Ni K, Poland D, Borth D, Li L-J (2016) YFCC100M: The new data in multimedia research. Commun ACM 59(2):64–73
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: IEEE International Conference on Computer Vision, pp. 4489–4497
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459
Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) UGC-VQA: Benchmarking blind video quality assessment for user generated content. IEEE Trans Image Process 30:4449–4464
Article ADS PubMed Google Scholar
Valenzise G, Magni S, Tagliasacchi M, Tubaro S (2011) No-reference pixel video quality monitoring of channel-induced distortion. IEEE Trans Circuits Syst Video Technol 22(4):605–618
Article Google Scholar
Vega MT, Mocanu DC, Stavrou S, Liotta A (2017) Predictive no-reference assessment of video quality. Signal Processing: Image Communication 52:20–32
Google Scholar
Wang Y, Inguva S, Adsumilli B (2019) YouTube UGC dataset for video compression research. In: IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5
Woo, S., Park J, Lee J, Kweon IS (2018) Cbam: Convolutional block attention module. In: European Conference on Computer Vision (ECCV), pp. 3–19
Wu J, Zeng J, Dong W, Shi G, Lin W (2019) Blind image quality assessment with hierarchy: Degradation from local structure to deep semantics. J Vis Commun Image Represent 58:353–362
Article Google Scholar
Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: European Conference on Computer Vision (ECCV), pp. 305–321
Xu M, Chen J, Wang H, Liu S, Li G, Bai Z (2020) C3DVQA: Full-reference video quality assessment with 3D convolutional neural network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4447–4451
Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans Image Process 23(11):4850–4862
Article ADS MathSciNet PubMed Google Scholar
Ye P, Kumar J, Kang L, Doermann D (2012) Unsupervised feature learning framework for no-reference image quality assessment. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1098–1105
Ying Z, Mandal M, Ghadiyaram D, Bovik A (2021) Patch-vq: ’patching up’ the video quality problem. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 14019–14029
Ying Z, Niu H, Gupta P, Mahajan D, Ghadiyaram D, Bovik A (2020) From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3575–3585
You J, Korhonen J (2019) Deep neural networks for no-reference video quality assessment. In: IEEE International Conference on Image Processing (ICIP), pp. 2349–2353
Zhang Y, Moorthy AK, Chandler DM, Bovik AC (2014) C-DIIVINE: No-reference image quality assessment based on local magnitude and phase statistics of natural scenes. Signal Processing: Image Communication 29(7):725–747
Google Scholar
Zhu K, Li C, Asari V, Saupe D (2014) No-reference video quality assessment based on artifact measurement and statistical analysis. IEEE Trans Circuits Syst Video Technol 25(4):533–546
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62072110, 61972097, and U21A20472, in part by the Major Science and Technology project of Fujian Province (China) under Granted 2021HZ022007, in part by the Industry-Academy Cooperation Project under Grant 2021H6022, in part by the Natural Science Foundation of Fujian Province under Grant 2020J01494.

Author information

Authors and Affiliations

College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, China
Yuzhen Niu, Yuming Zheng, Zhenlong Wang & Mengzhen Zhong
College of Physics and Information Engineering, Fuzhou University, Fuzhou, 350108, China
Yuzhen Niu & Tiesong Zhao

Authors

Yuzhen Niu
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhenlong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengzhen Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Tiesong Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhen Niu.

Ethics declarations

Conflict of interest

The authors declare that they have conflict of interest with all researcheres at Fuzhou University, China.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Abbreviations List

Table 7 shows the abbreviation correspondence in the paper.

Table 7 Correspondence table of abbreviations used in the paper

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Niu, Y., Zheng, Y., Wang, Z. et al. Blind consumer video quality assessment with spatial-temporal perception and fusion. Multimed Tools Appl 83, 18969–18986 (2024). https://doi.org/10.1007/s11042-023-16242-8

Download citation

Received: 05 October 2022
Revised: 05 June 2023
Accepted: 04 July 2023
Published: 25 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-16242-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Blind consumer video quality assessment with spatial-temporal perception and fusion

Abstract

Access this article

Similar content being viewed by others

HDR-BVQM: High dynamic range blind video quality model

Blind Video Quality Assessment Using Fusion of Novel Structural Features and Deep Features

No-reference perceived image quality measurement for multiple distortions

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Abbreviations List

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Blind consumer video quality assessment with spatial-temporal perception and fusion

Abstract

Access this article

Similar content being viewed by others

HDR-BVQM: High dynamic range blind video quality model

Blind Video Quality Assessment Using Fusion of Novel Structural Features and Deep Features

No-reference perceived image quality measurement for multiple distortions

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Abbreviations List

Appendix A: Abbreviations List

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation