Skip to main content
Log in

Frame importance and temporal memory effect-based fast video quality assessment for user-generated content

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

User-generated content (UGC) has become increasingly popular, promoted by the widespread use of social media and mobile devices. Therefore, instant and immersive UGC video quality assessment is urgently needed to provide appropriate recommendations for video reviewers prior to distribution. However, existing methods are neither efficient at assessing UGC videos due to the expensive frame-by-frame process nor suitable for deployment on devices with limited computational capabilities because they require sophisticated GPU-dependent computation. In this paper, we propose a fast UGC video quality assessment method, named FastVQA, by considering both keyframe importance and human temporal memory effects. First, a novel key frame selection strategy based on feature entropy is developed to achieve efficient and accurate feature extraction. Inspired by human short-term and long-term memory effects, we design a temporal feature aggregation module by taking both local content details and global semantic information into consideration. Experimental results show that FastVQA can outperform the state-of-the-art (SOTA) methods on many datasets with significantly reduced CPU time, which implies that FastVQA can achieve a better balance between complexity and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555

  2. D G, J P, A CB, A KM, P P, K Y, (2018) In-capture mobile video distortions: A study of subjective behavior and objective algorithms. IEEE Transactions on Circuits and Systems for Video Technology 28(9):2061–2077

  3. Fastowicz J, Grudziński M, Tecław M, Okarma K (2019) Objective 3d printed surface quality assessment based on entropy of depth maps. Entropy 21(1):97

    Article  Google Scholar 

  4. Feichtenhofer C (2020) X3d: Expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 203–213

  5. Ghadiyaram D, Bovik AC (2015) Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing 25(1):372–387

    Article  MathSciNet  MATH  Google Scholar 

  6. Group VQE, et al. (2003) Final report from the video quality experts group on the validation of objective models of video quality assessment, phase ii. 2003 VQEG

  7. Guan X, He L, Li M, Li F (2019) Entropy based data expansion method for blind image quality assessment. Entropy 22(1):60

    Article  Google Scholar 

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. Hosu V, Hahn F, Jenadeleh M, Lin H, Men H, Szirányi T, Li S, Saupe D (2017) The konstanz natural video database (konvid-1k). In: 2017 Ninth international conference on quality of multimedia experience, IEEE, pp 1–6

  10. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861

  11. Hu Y, Zhang B, Zhang Y, Jiang C, Chen Z (2022) A feature-level full-reference image denoising quality assessment method based on joint sparse representation. Applied Intelligence pp 1–16

  12. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360

  13. Imran J, Raman B, Rajput AS (2020) Robust, efficient and privacy-preserving violent activity recognition in videos. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing, pp 2081–2088

  14. Kim J, Nguyen AD, Ahn S, Luo C, Lee S (2018) Multiple level feature-based universal blind image quality assessment model. In: 2018 25th IEEE International Conference on Image Processing (ICIP), IEEE, pp 291–295

  15. Köpüklü O, Kose N, Gunduz A, Rigoll G (2019) Resource efficient 3d convolutional neural networks. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), IEEE, pp 1910–1919

  16. Korbar B, Tran D, Torresani L (2019) Scsampler: Sampling salient clips from video for efficient action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6232–6242

  17. Korhonen J (2019) Two-level approach for no-reference consumer video quality assessment. IEEE Transactions on Image Processing 28(12):5923–5938

    Article  MathSciNet  MATH  Google Scholar 

  18. Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  19. Kundu D, Ghadiyaram D, Bovik AC, Evans BL (2017) No-reference quality assessment of tone-mapped hdr pictures. IEEE Transactions on Image Processing 26(6):2957–2971

    Article  MathSciNet  MATH  Google Scholar 

  20. Li D, Jiang T, Jiang M (2019) Quality assessment of in-the-wild videos. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2351–2359

  21. Li D, Jiang T, Jiang M (2021) Unified quality assessment of in-the-wild videos with mixed datasets training. International Journal of Computer Vision 129(4):1238–1257

    Article  Google Scholar 

  22. Liu M, Zhu M (2018) Mobile video object detection with temporally-aware feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5686–5695

  23. Luo Z, Tang Z, Jiang L, Ma G (2022) A referenceless image degradation perception method based on the underwater imaging model. Applied Intelligence 52(6):6522–6538

    Article  Google Scholar 

  24. M N, T V, M V, T V, P O, J H, (2016) Cvd 2014-a database for evaluating no-reference video quality assessment algorithms. IEEE Transactions on Image Processing 25(7):3073–3086

  25. Ma J, Wu J, Li L, Dong W, Xie X, Shi G, Lin W (2021) Blind image quality assessment with active inference. IEEE Transactions on Image Processing 30:3650–3663

    Article  Google Scholar 

  26. Mittal A, Moorthy AK, Bovik AC (2011) Blind/referenceless image spatial quality evaluator. In: 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR), IEEE, pp 723–727

  27. Mittal A, Soundararajan R, Bovik AC (2013) Making a completely blind image quality analyzer. IEEE Signal Processing Letters 20(3):209–212

    Article  Google Scholar 

  28. Mittal A, Saad MA, Bovik AC (2016) A completely blind video integrity oracle. IEEE Transactions on Image Processing 25(1):289–300

    Article  MathSciNet  MATH  Google Scholar 

  29. Nizami IF, Majid M, Khurshid K (2018) New feature selection algorithms for no-reference image quality assessment. Applied Intelligence 48(10):3482–3501

    Article  Google Scholar 

  30. Ren H, Chen D, Wang Y (2018) Ran4iqa: restorative adversarial nets for no-reference image quality assessment. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, pp 7308–7314

  31. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  32. Saad MA, Bovik AC, Charrier C (2014) Blind prediction of natural video quality. IEEE Transactions on Image Processing 23(3):1352–1365

    Article  MathSciNet  MATH  Google Scholar 

  33. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  34. Tu Z, Wang Y, Birkbeck N, Adsumilli B, Bovik AC (2021) Ugc-vqa: Benchmarking blind video quality assessment for user generated content. IEEE transactions on image processing 30:4449–4464

    Article  Google Scholar 

  35. Wang P, Zhang J, Zhu H (2021) Fire detection in video surveillance using superpixel-based region proposal and ese-shufflenet. Multimedia Tools and Applications pp 1–28

  36. Xu J, Ye P, Li Q, Du H, Liu Y, Doermann D (2016) Blind image quality assessment based on high order statistics aggregation. IEEE Transactions on Image Processing 25(9):4444–4457

    Article  MathSciNet  MATH  Google Scholar 

  37. Xu J, Zhou W, Chen Z (2020) Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology 31(5):1724–1737

  38. Xue W, Mou X, Zhang L, Bovik AC, Feng X (2014) Blind image quality assessment using joint statistics of gradient magnitude and laplacian features. IEEE Transactions on Image Processing 23(11):4850–4862

    Article  MathSciNet  MATH  Google Scholar 

  39. Yang S, Jiang Q, Lin W, Wang Y (2019) Sgdnet: An end-to-end saliency-guided deep neural network for no-reference image quality assessment. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 1383–1391

  40. Yang X, Li F, Liu H (2020) Deep feature importance awareness based no-reference image quality prediction. Neurocomputing 401:209–223

    Article  Google Scholar 

  41. Ye P, Kumar J, Kang L, Doermann D (2012) Unsupervised feature learning framework for no-reference image quality assessment. In: 2012 IEEE conference on computer vision and pattern recognition, pp 1098–1105

  42. Yu X, Tian X (2022) A fault detection algorithm for pipeline insulation layer based on immune neural network. International Journal of Pressure Vessels and Piping 196:104611

    Article  Google Scholar 

  43. Yu X, Ye X, Zhang S (2022) Floating pollutant image target extraction algorithm based on immune extremum region. Digital Signal Processing 123:103442

    Article  Google Scholar 

  44. Zhang T, Zhang K, Xiao C, Xiong Z, Lu J (2022a) Joint channel-spatial attention network for super-resolution image quality assessment. Applied Intelligence pp 1–15

  45. Zhang W, Ma K, Zhai G, Yang X (2021) Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing 30:3474–3486

    Article  Google Scholar 

  46. Zhang W, Zhuang P, Sun HH, Li G, Kwong S, Li C (2022) Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Transactions on Image Processing 31:3997–4010

  47. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856

  48. Zhang Y, Wang Y, Camps O, Sznaier M (2020) Key frame proposal network for efficient pose estimation in videos. In: European Conference on Computer Vision, Springer, pp 609–625

  49. Zhou Z, Zhang B, Yu X (2022) Immune coordination deep network for hand heat trace extraction. Infrared Physics and Technology 127:104400

    Article  Google Scholar 

  50. Zhu L, Tran D, Sevilla-Lara L, Yang Y, Feiszli M, Wang H (2020) Faster recurrent networks for efficient video classification. Proceedings of the AAAI Conference on Artificial Intelligence 34:13098–13105

    Article  Google Scholar 

  51. Zhuang P, Wu J, Porikli F, Li C (2022) Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Transactions on Image Processing 31:5442–5455

    Article  Google Scholar 

Download references

Acknowledgements

This research work was supported in part by the National Science Foundation of China (U1903213) and the Natural Science Foundation of Sichuan Province (2022NSFSC0966).

Author information

Authors and Affiliations

Authors

Contributions

Yuan Zhang contributed to the conception of the study; Mingchuan Yang and Zhiwei Huang performed the experiment; Lijun He and Zijun Wu contributed significantly to analysis and manuscript preparation; Yuan Zhang, Mingchuan Yang and Zhiwei Huang performed the data analyses and wrote the manuscript.

Corresponding author

Correspondence to Yuan Zhang.

Ethics declarations

Ethics approval

All authors contributed to the conception and design of the study. All authors read and approved the final manuscript.

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Yang, M., Huang, Z. et al. Frame importance and temporal memory effect-based fast video quality assessment for user-generated content. Appl Intell 53, 21517–21531 (2023). https://doi.org/10.1007/s10489-023-04624-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04624-2

Keywords

Navigation