A video structural similarity quality metric based on a joint spatial-temporal visual attention model

Zhang, Hua; Tian, Xiang; Chen, Yao-wu

doi:10.1631/jzus.A0920035

A video structural similarity quality metric based on a joint spatial-temporal visual attention model

Published: 01 December 2009

Volume 10, pages 1696–1704, (2009)
Cite this article

Journal of Zhejiang University-SCIENCE A Aims and scope Submit manuscript

Hua Zhang¹,
Xiang Tian¹ &
Yao-wu Chen¹

59 Accesses
4 Citations
Explore all metrics

Abstract

Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting functions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video super-resolution based on deep learning: a comprehensive survey

Article 01 April 2022

Fragrant: frequency-auxiliary guided relational attention network for low-light action recognition

Article 14 May 2024

Image Inpainting: A Review

Article 06 December 2019

References

Aziz, M.Z., Mertsching, B., 2008. Fast and robust generation of feature maps for region-based visual attention. IEEE Trans. Image Process., 17(5):633–644. [doi:10.1109/TIP.2008.919365]
Article MathSciNet Google Scholar
Brooks, A.C., Zhao, X.N., Pappas, T.N., 2008. Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. IEEE Trans. Image Process., 17(8):1261–1273. [doi:10.1109/TIP.2008.926161]
Article MathSciNet Google Scholar
Chen, Q.Q., Chen, Z.B., Gu, X.D., Wang, C., 2007. Attention-based adaptive intra refresh for error-prone video transmission. IEEE Commun. Mag., 45(1):52–60. [doi:10.1109/MCOM.2007.284538]
Article Google Scholar
Grill-Spector, K., Malach, R., 2004. The human visual cortex. Ann. Rev. Neurosci., 27:649–677. [doi:10.1146/annurev.neuro.27.070203.144220]
Article Google Scholar
Itti, L., 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn., 12(6):1093–1123.
Article Google Scholar
Itti, L., Baldi, P., 2005. A Principled Approach to Detecting Surprising Events in Video. IEEE Int. Conf. on Computer Vision and Pattern Recognition, p.631–637. [doi:10.1109/CVPR.2005.40]
Lu, Z.K., Liu, W.S., Yang, X.K., Ong, E.P., Yao, S.S., 2005. Modeling visual attention’s modulatory aftereffects on visual sensitivity and quality evaluation. IEEE Trans. Image Process., 14(11):1928–1942. [doi:10.1109/TIP.2005.854478]
Article Google Scholar
Martinez-Rach, M., Lopez, O., Pinol, P., Malumbres, M.P., Oliver, J., 2006. A Study of Objective Quality Assessment Metrics for Video Codec Design and Evaluation. Eighth IEEE Int. Symp. on Multimedia, p.517–524. [doi:10.1109/ISM.2006.15]
Seshadrinathan, K., Bovik, A.C., 2007. A Structural Similarity Metric for Video Based on Motion Models. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.I-869–I-872. [doi:10.1109/ICASSP.2007.366046]
Sheikh, H.R., Sabir, M.F., Bovik, A.C., 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process., 15(11):3440–3451. [doi:10.1109/TIP.2006.881959]
Article Google Scholar
Tang, C.W., 2007. Spatiotemporal visual considerations for video coding. IEEE Trans. Multim., 9(2):231–238. [doi:10.1109/TMM.2006.886328]
Article Google Scholar
VQEG, 2000. Final Report from the Video Quality Expert Group on the Validation of Objective Models of Video Quality Assessment. Video Quality Expert Group. Available from http://www.vqeg.org [Accessed on Aug. 22, 2008].
Wang, Z., Li, Q., 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A, 24:B61–B69. [doi:10.1364/JOSAA.24.000B61]
Article Google Scholar
Wang, Z., Simoncelli, E.P., 2005. Translation Insensitive Image Similarity in Complex Wavelet Domain. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.573–576.
Wang, Z., Sheikh, H.R., Bovik, A.C., 2003. Objective Video Quality Assessment. In: Furht, B., Marques, O. (Eds.), The Handbook of Video Databases: Design and Applications. CRC Press, Florida, USA, p.1041–1078.
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004a. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612. [doi:10.1109/TIP.2003.819861]
Article Google Scholar
Wang, Z., Lu, L., Bovik, A.C., 2004b. Video quality assessment based on structural distortion measurement. Signal Process.: Image Commun., 19(2):121–132.
Google Scholar
Zheng, Y.Y., 2008. Research on H.264 Region-of-Interest Coding Based on Visual Perception. PhD Thesis, Zhejiang University, Hangzhou, China (in Chinese).
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Advanced Digital Technology and Instrumentation, Zhejiang University, Hangzhou, 310027, China
Hua Zhang, Xiang Tian & Yao-wu Chen

Authors

Hua Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yao-wu Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yao-wu Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Tian, X. & Chen, Yw. A video structural similarity quality metric based on a joint spatial-temporal visual attention model. J. Zhejiang Univ. Sci. A 10, 1696–1704 (2009). https://doi.org/10.1631/jzus.A0920035

Download citation

Received: 14 January 2009
Accepted: 23 April 2009
Published: 01 December 2009
Issue Date: December 2009
DOI: https://doi.org/10.1631/jzus.A0920035

Key words

CLC number

TN919.8

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A video structural similarity quality metric based on a joint spatial-temporal visual attention model

Abstract

Access this article

Similar content being viewed by others

Video super-resolution based on deep learning: a comprehensive survey

Fragrant: frequency-auxiliary guided relational attention network for low-light action recognition

Image Inpainting: A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

A video structural similarity quality metric based on a joint spatial-temporal visual attention model

Abstract

Access this article

Similar content being viewed by others

Video super-resolution based on deep learning: a comprehensive survey

Fragrant: frequency-auxiliary guided relational attention network for low-light action recognition

Image Inpainting: A Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation