Skip to main content
Log in

A video structural similarity quality metric based on a joint spatial-temporal visual attention model

  • Published:
Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Abstract

Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting functions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aziz, M.Z., Mertsching, B., 2008. Fast and robust generation of feature maps for region-based visual attention. IEEE Trans. Image Process., 17(5):633–644. [doi:10.1109/TIP.2008.919365]

    Article  MathSciNet  Google Scholar 

  • Brooks, A.C., Zhao, X.N., Pappas, T.N., 2008. Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. IEEE Trans. Image Process., 17(8):1261–1273. [doi:10.1109/TIP.2008.926161]

    Article  MathSciNet  Google Scholar 

  • Chen, Q.Q., Chen, Z.B., Gu, X.D., Wang, C., 2007. Attention-based adaptive intra refresh for error-prone video transmission. IEEE Commun. Mag., 45(1):52–60. [doi:10.1109/MCOM.2007.284538]

    Article  Google Scholar 

  • Grill-Spector, K., Malach, R., 2004. The human visual cortex. Ann. Rev. Neurosci., 27:649–677. [doi:10.1146/annurev.neuro.27.070203.144220]

    Article  Google Scholar 

  • Itti, L., 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn., 12(6):1093–1123.

    Article  Google Scholar 

  • Itti, L., Baldi, P., 2005. A Principled Approach to Detecting Surprising Events in Video. IEEE Int. Conf. on Computer Vision and Pattern Recognition, p.631–637. [doi:10.1109/CVPR.2005.40]

  • Lu, Z.K., Liu, W.S., Yang, X.K., Ong, E.P., Yao, S.S., 2005. Modeling visual attention’s modulatory aftereffects on visual sensitivity and quality evaluation. IEEE Trans. Image Process., 14(11):1928–1942. [doi:10.1109/TIP.2005.854478]

    Article  Google Scholar 

  • Martinez-Rach, M., Lopez, O., Pinol, P., Malumbres, M.P., Oliver, J., 2006. A Study of Objective Quality Assessment Metrics for Video Codec Design and Evaluation. Eighth IEEE Int. Symp. on Multimedia, p.517–524. [doi:10.1109/ISM.2006.15]

  • Seshadrinathan, K., Bovik, A.C., 2007. A Structural Similarity Metric for Video Based on Motion Models. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.I-869–I-872. [doi:10.1109/ICASSP.2007.366046]

  • Sheikh, H.R., Sabir, M.F., Bovik, A.C., 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process., 15(11):3440–3451. [doi:10.1109/TIP.2006.881959]

    Article  Google Scholar 

  • Tang, C.W., 2007. Spatiotemporal visual considerations for video coding. IEEE Trans. Multim., 9(2):231–238. [doi:10.1109/TMM.2006.886328]

    Article  Google Scholar 

  • VQEG, 2000. Final Report from the Video Quality Expert Group on the Validation of Objective Models of Video Quality Assessment. Video Quality Expert Group. Available from http://www.vqeg.org [Accessed on Aug. 22, 2008].

  • Wang, Z., Li, Q., 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A, 24:B61–B69. [doi:10.1364/JOSAA.24.000B61]

    Article  Google Scholar 

  • Wang, Z., Simoncelli, E.P., 2005. Translation Insensitive Image Similarity in Complex Wavelet Domain. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.573–576.

  • Wang, Z., Sheikh, H.R., Bovik, A.C., 2003. Objective Video Quality Assessment. In: Furht, B., Marques, O. (Eds.), The Handbook of Video Databases: Design and Applications. CRC Press, Florida, USA, p.1041–1078.

    Google Scholar 

  • Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004a. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612. [doi:10.1109/TIP.2003.819861]

    Article  Google Scholar 

  • Wang, Z., Lu, L., Bovik, A.C., 2004b. Video quality assessment based on structural distortion measurement. Signal Process.: Image Commun., 19(2):121–132.

    Google Scholar 

  • Zheng, Y.Y., 2008. Research on H.264 Region-of-Interest Coding Based on Visual Perception. PhD Thesis, Zhejiang University, Hangzhou, China (in Chinese).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yao-wu Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Tian, X. & Chen, Yw. A video structural similarity quality metric based on a joint spatial-temporal visual attention model. J. Zhejiang Univ. Sci. A 10, 1696–1704 (2009). https://doi.org/10.1631/jzus.A0920035

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.A0920035

Key words

CLC number

Navigation