Abstract
Objective video quality assessment plays a very important role in multimedia signal processing. Several extensions of the structural similarity (SSIM) index could not predict the quality of the video sequence effectively. In this paper we propose a structural similarity quality metric for videos based on a spatial-temporal visual attention model. This model acquires the motion attended region and the distortion attended region by computing the motion features and the distortion contrast. It mimics the visual attention shifting between the two attended regions and takes the burst of error into account by introducing the non-linear weighting functions to give a much higher weighting factor to the extremely damaged frames. The proposed metric based on the model renders the final object quality rating of the whole video sequence and is validated using the 50 Hz video sequences of Video Quality Experts Group Phase I test database.
Similar content being viewed by others
References
Aziz, M.Z., Mertsching, B., 2008. Fast and robust generation of feature maps for region-based visual attention. IEEE Trans. Image Process., 17(5):633–644. [doi:10.1109/TIP.2008.919365]
Brooks, A.C., Zhao, X.N., Pappas, T.N., 2008. Structural similarity quality metrics in a coding context: exploring the space of realistic distortions. IEEE Trans. Image Process., 17(8):1261–1273. [doi:10.1109/TIP.2008.926161]
Chen, Q.Q., Chen, Z.B., Gu, X.D., Wang, C., 2007. Attention-based adaptive intra refresh for error-prone video transmission. IEEE Commun. Mag., 45(1):52–60. [doi:10.1109/MCOM.2007.284538]
Grill-Spector, K., Malach, R., 2004. The human visual cortex. Ann. Rev. Neurosci., 27:649–677. [doi:10.1146/annurev.neuro.27.070203.144220]
Itti, L., 2005. Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn., 12(6):1093–1123.
Itti, L., Baldi, P., 2005. A Principled Approach to Detecting Surprising Events in Video. IEEE Int. Conf. on Computer Vision and Pattern Recognition, p.631–637. [doi:10.1109/CVPR.2005.40]
Lu, Z.K., Liu, W.S., Yang, X.K., Ong, E.P., Yao, S.S., 2005. Modeling visual attention’s modulatory aftereffects on visual sensitivity and quality evaluation. IEEE Trans. Image Process., 14(11):1928–1942. [doi:10.1109/TIP.2005.854478]
Martinez-Rach, M., Lopez, O., Pinol, P., Malumbres, M.P., Oliver, J., 2006. A Study of Objective Quality Assessment Metrics for Video Codec Design and Evaluation. Eighth IEEE Int. Symp. on Multimedia, p.517–524. [doi:10.1109/ISM.2006.15]
Seshadrinathan, K., Bovik, A.C., 2007. A Structural Similarity Metric for Video Based on Motion Models. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.I-869–I-872. [doi:10.1109/ICASSP.2007.366046]
Sheikh, H.R., Sabir, M.F., Bovik, A.C., 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process., 15(11):3440–3451. [doi:10.1109/TIP.2006.881959]
Tang, C.W., 2007. Spatiotemporal visual considerations for video coding. IEEE Trans. Multim., 9(2):231–238. [doi:10.1109/TMM.2006.886328]
VQEG, 2000. Final Report from the Video Quality Expert Group on the Validation of Objective Models of Video Quality Assessment. Video Quality Expert Group. Available from http://www.vqeg.org [Accessed on Aug. 22, 2008].
Wang, Z., Li, Q., 2007. Video quality assessment using a statistical model of human visual speed perception. J. Opt. Soc. Am. A, 24:B61–B69. [doi:10.1364/JOSAA.24.000B61]
Wang, Z., Simoncelli, E.P., 2005. Translation Insensitive Image Similarity in Complex Wavelet Domain. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, p.573–576.
Wang, Z., Sheikh, H.R., Bovik, A.C., 2003. Objective Video Quality Assessment. In: Furht, B., Marques, O. (Eds.), The Handbook of Video Databases: Design and Applications. CRC Press, Florida, USA, p.1041–1078.
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., 2004a. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612. [doi:10.1109/TIP.2003.819861]
Wang, Z., Lu, L., Bovik, A.C., 2004b. Video quality assessment based on structural distortion measurement. Signal Process.: Image Commun., 19(2):121–132.
Zheng, Y.Y., 2008. Research on H.264 Region-of-Interest Coding Based on Visual Perception. PhD Thesis, Zhejiang University, Hangzhou, China (in Chinese).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, H., Tian, X. & Chen, Yw. A video structural similarity quality metric based on a joint spatial-temporal visual attention model. J. Zhejiang Univ. Sci. A 10, 1696–1704 (2009). https://doi.org/10.1631/jzus.A0920035
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.A0920035