Abstract
We survey recent developments in multimedia signal quality assessment, including image, audio, video, and combined signals. Such an overview is timely given the recent explosion in all-digital sensory entertainment and communication devices pervading the consumer space. Owing to the sensory nature of these signals, perceptual models lie at the heart of multimedia signal quality assessment algorithms. We survey these models and recent competitive algorithms and discuss comparison studies that others have conducted. In this context we also describe existing signal quality assessment databases. We envision that the reader will gain a firmer understanding of the broad topic of multimedia quality assessment, of the various sub-disciplines corresponding to different signal types, how these signals types co-relate in producing an overall user experience, and what directions of research remain to be pursued.
Similar content being viewed by others
References
Avcibas I, Sankur B, Sayood K (2002) Statistical evaluation of image quality measures. J Electron Imaging 11(2):206–223
Barkowsky M, Bialkowski J, Bitto R, Kaup A (2007) Temporal registration using 3D phase correlation and a maximum likelihood approach in the perceptual evaluation of video quality. In: IEEE workshop on multimedia signal proc
Beerends JG, Stemerdink JA (1992) A perceptual audio quality measure based on a psychoacoustic sound representation. J Audio Eng Soc 40(12):963–978
Born RT, Bradley DC (2005) Structure and function of visual area MT. Annu Rev Neurosci 28:157–189
Brandenburg T, Sporer K (1992) NMR and masking flag: evaluation of quality using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurement
Carnec M, Le Callet P, Barba D (2008) Objective quality assessment of color images based on a generic perceptual reduced reference. Signal Process Image Commun 23(4):239–256
Chandler DM, Hemami SS (2007) VSNR: a wavelet-based visual signal-to-noise ratio for natural images. IEEE Trans Image Process 16(9):2284–2298
Channappayya SS, Bovik AC, Caramanis C, Heath RW Jr (2008) Design of linear equalizers optimized for the structural similarity index. IEEE Trans Image Process 17(6):857–872
Channappayya SS, Bovik AC, Heath RW Jr (2008) Rate bounds on SSIM index of quantized images. IEEE Trans Image Process 17(9):1624–1639
Colomes C, Lever M, Rault J-B, Dehery Y-F, Faucon G (1995) A perceptual model applied to audio bit-rate reduction. J Audio Eng Soc 43(4):233–240
Creusere C (2003) Quantifying perceptual distortion in scalably compressed mpeg audio. In: Conference record of the thirty-seventh asilomar conference on signals, systems and computers, vol 1, pp 265–269
Creusere C, Hardin J (2010) Assessing the quality of audio containing temporally varying distortions. IEEE Trans Speech Audio Lang Process PP(99):1–1
Daly S (1993) The visible difference predictor: An algorithm for the assessment of image fidelity. In: Watson AB (ed) Digital images and human vision. The MIT, pp 176–206
Damera-Venkata N, Kite T, Geisler W, Evans B, Bovik A (2000) Image quality assessment based on a degradation model. IEEE Trans Image Process 9(4):636–650
Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math 41(7):909–996
Daugman JG (1985) Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J Opt Soc Am A (Opt Image Sci) 2(7):1160–1169
De Simone F, Naccari M, Tagliasacchi M, Dufaux F, Tubaro S, Ebrahimi T (2009) Subjective assessment of H.264/AVC video sequences transmitted over a noisy channel. In: International workshop on quality of multimedia experience, pp 204–209
Dehaene S (2003) The neural basis of the weber-fechner law: a logarithmic mental number line. Trends Cogn Sci 7(4):145–147
Dixon NF, Spitz L (1980) The detection of auditory visual desynchrony. Perception 9(6):719–721
Final report from the video quality experts group on the validation of objective quality metrics for video quality assessment (2000) Available online: http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseI/COM-80E_final_report.pdf. Accessed June 2000
Fleet DJ, Jepson AD (1990) Computation of component image velocity from local phase information. Int J Comput Vis 5(1):77–104
Foley J (1994) Human luminance pattern-vision mechanisms: masking experiments require a new model. J Opt Soc Am A (Opt Image Sci) 11(6):1710–1719
Fredericksen RE, Hess RF (1997) Temporal detection in human vision: dependence on stimulus energy. J Opt Soc Am A (Opt Image Sci Vis) 14(10):2557–2569
George S, Zielinski S, Rumsey F (2006) Feature extraction for the prediction of multichannel spatial audio fidelity. IEEE Trans Speech Audio Lang Process 14(6):1994–2005
Hands DS (2004) A basic multimedia quality model. IEEE Trans Multimedia 6(6):806–816
Hekstra AP, Beerends JG, Ledermann D, de Caluwe FE, Kohler S, Koenen RH, Rihs S, Ehrsam M, Schlauss D (2002) PVQM—A perceptual video quality measure. Signal Process Image Commun 17:781–798
Herre J, Eberlein E, Schott H, Schmidmer C (1992) Analysis tool for realtime measurements using perceptual criteria. In: Audio engineering society conference: 11th international conference: test & measurement
Hewage CTER, Worrall ST, Dogan S, Kondoz AM (2008) Prediction of stereoscopic video quality using objective quality models of 2-d video. Electron Lett 44(16):963–965
Huber R, Kollmeier B (2006) PEMO-Q—A new method for objective audio quality assessment using a model of auditory perception. IEEE Trans Speech Audio Lang Process 14(6):1902–1911
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2(3):194–203
ITU-R Recommendation BT.500-11 (2000) Methodology for the subjective assessment of the quality of television pictures. International Telecommunications Union, Tech Rep
ITU-T Recommendation P.800 (1996) Methods for subjective determination of transmission quality. International Telecommunications Union, Tech Rep
Kandadai S, Hardin J, Creusere C (2008) Audio quality assessment using the mean structural similarity measure. In: IEEE international conference on acoustics, speech and signal processing, pp 221–224
Karjalainen M (1985) A new auditory model for the evaluation of sound quality of audio systems. In: IEEE international conference on acoustics, speech, and signal processing, vol 10, pp 608–611
Kelly DH (1984) Retinal inhomogeneity. i. spatiotemporal contrast sensitivity. J Opt Soc Am A 1(1):107–113
Lambrecht CJvdB, Kunt M (1998) Characterization of human visual sensitivity for video imaging applications. Signal Process 67(3):255–269
Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817
Legge GE, Foley JM (1980) Contrast masking in human vision. J Opt Soc Am 70(12):1458–1471
Lubin J (1993) The use of psychophysical data and models in the analysis of display system performance. In: Watson AB (ed) Digital images and human vision. The MIT, pp 163–178
Malkowski M, Claben D (2008) Performance of video telephony services in UMTS using live measurements and network emulation. Wirel Pers Commun 1:19–32
Mannos J, Sakrison D (1974) The effects of a visual fidelity criterion of the encoding of images. IEEE Trans Inf Theory 20(4):525–536
Masry M, Hemami SS, Sermadevi Y (2006) A scalable wavelet-based video distortion metric and applications. IEEE Trans Circuits Syst Video Technol 16(2):260–273
Mehrgardt S, Mellert V (1977) Transformation characteristics of the external human ear. J Acoust Soc Am 61(6):1567–1576
Method for objective measurements of perceived audio quality. ITU Std. BS. 1387, 1999
Moorthy A, Seshadrinathan K, Soundararajan R, Bovik AC (2010) Wireless video quality assessment: a study of subjective scores and objective algorithms. IEEE Trans Circuits Syst Video Technol 20(4):587–599
Movshon JA, Newsome WT (1996) Visual response properties of striate cortical neurons projecting to Area MT in macaque monkeys. J Neurosci 16(23):7733–7741
Nachmias J, Sansbury RV (1974) Grating contrast: discrimination may be better than detection. Vis Res 14(10):1039–1042
Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference (2004) International Telecommunications Union Std. ITU-T Rec J 144
Paillard B, Mabilleau P, Morissette S, Soumagne J (1992) PERCEVAL: Perceptual evaluation of the quality of audio signals. J Audio Eng Soc 40(1/2):21–31
Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. International Telecommunications Union Std., p 862, 2001
Pinson MH, Wolf S (2004) A new standardized method for objectively measuring video quality. IEEE Trans Broadcast 50(3):312–322
Ponomarenko N, Lukin V, Zelensky A, Egiazarian K, Carli M, Battisti F (2009) TID2008—a database for evaluation of full-reference visual quality assessment metrics. Adv Modern Radio-Electronics 10:30–45
Rajashekar U, van der Linde I, Bovik AC, Cormack LK (2008) GAFFE: a gaze-attentive fixation finding engine. IEEE Trans Image Process 17(4):564–573
Rihs S (1995) The influence of audio on perceived picture quality and subjective audio-video delay tolerance. RACE MOSAIC deliverable R211 180CESR007.B1, Tech. Rep
Rix AW, Beerends JG, Kim D-S, Kroon P, Ghitza O (2006) Objective assessment of speech and audio quality—technology and applications. IEEE Trans Speech Audio Lang Process 14(6):1890–1901
Rix AW, Hollier MP, Hekstra AP, Beerends JG (2002) Perceptual evaluation of speech quality (PESQ): the new ITU standard for end-to-end speech quality assessment part I–time-delay compensation. J Audio Eng Soc 50(10):755–764
Robson JG (1966) Spatial and temporal contrast-sensitivity functions of the visual system. J Opt Soc Am 56(8):1141–1142
Ross J, Speed HD (1991) Contrast adaptation and contrast masking in human vision. Proc Biol Sci 246(1315):61–70
Schober HAW, Hilz R (1965) Contrast sensitivity of the human eye for square-wave gratings. J Opt Soc Am 55(9):1086–1090
Schroeder MR, Atal BS, Hall JL (1978) Optimizing digital speech coders by exploiting masking properties of the human ear. J Acoust Soc Am 64(S1):S139–S139
Seshadrinathan K, Bovik AC (2007) A structural similarity metric for video based on motion models. In: IEEE intl. conf. on acoustics, speech, and signal proc
Seshadrinathan K, Bovik AC (2008) Unifying analysis of full reference image quality assessment. In: IEEE intl. conf. on image proc. San Diego, CA, pp 1200–1203
Seshadrinathan K, Bovik AC (2009) Video quality assessment. In: Bovik AC (ed) The essential guide to video processing, chapter 14. Academic, pp 417–436
Seshadrinathan K, Bovik AC (2010) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350
Seshadrinathan K, Safranek RJ, Chen J, Pappas TN, Sheikh HR, Simoncelli EP, Wang Z, Bovik AC (2009) Image quality assessment. In: Bovik AC (ed) The essential guide to image processing, chapter 21. Academic, pp 553–596
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441
Sheikh HR, Bovik AC (2006) An evaluation of recent full reference image quality assessment algorithms. IEEE Trans Image Process 15(11):3440–3451
Sheikh HR, Bovik AC (2006) Image information and visual quality. IEEE Trans Image Process 15(2):430–444
Simoncelli EP, Heeger DJ (1998) A model of neuronal responses in visual area MT. Vis Res 38(5):743–761
Sporer T (1997) Objective audio signal evaluation-applied psychoacoustics for modeling the perceived quality of digital audio. In: Audio engineering society convention 103
Steinmetz R (1996) Human perception of jitter and media synchronization. IEEE J Sel Areas Commun 14(1):61–72
Terhardt E (1979) Calculating virtual pitch. Hear Res 1(2):155–182
Teo PC, Heeger DJ (1994) Perceptual image distortion. In: Proceedings of the IEEE international conference on image processing, vol 2. IEEE, pp 982–986
The Video Quality Experts Group (2003) Final VQEG report on the validation of objective models of video quality assessment. Available online: http://www.its.bldrdoc.gov/vqeg/projects/frtv_phaseII. Accessed 25 August 2003
Thiede E, Kabot T (1996) A new perceptual quality measure for bit-rate reduced audio. In: Audio engineering society convention 100
Thiede T, Treurniet WC, Bitto R, Schmidmer C, Sporer T, Beerends JG, Colomes C (2000) PEAQ—the ITU standard for objective measurement of perceived audio quality. J Audio Eng Soc 48(1/2):3–29
Toet A, Lucassen MP (2003) A new universal colour image fidelity metric. Displays 24(4–5):197–207
van den Branden Lambrecht CJ, Verscheure O (1996) Perceptual quality measure using a spatiotemporal model of the human visual system. In: Proc. SPIE, vol 2668, no. 1. SPIE, San Jose, pp 450–461
Van der Weken D, Nachtegael M, Kerre EE (2004) Using similarity measures and homogeneity for the comparison of images. Image Vis Comput 22(9):695–702
van Dijk AM, Martens J-B, Watson AB (1995) Quality asessment of coded images using numerical category scaling. In: Proc. SPIE—advanced image and video communications and storage technologies
van Nes FL, Bouman MA (1967) Spatial modulation transfer in the human eye. J Opt Soc Am 57(3):401–406
Wandell BA (1995) Foundations of vision. Sinauer Associates Inc., Sunderland
Wang S, Sekey A, Gersho A (1992) An objective measure for predicting subjective quality of speech coders. IEEE J Sel Areas Commun 10(5):819–829
Wang Z, Bovik AC (2002) A universal image quality index. IEEE Signal Process Lett 9(3):81–84
Wang Z, Bovik AC (2006) Modern image quality assessment. Morgan and Claypool Publishing Co., New York
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Wang Z, Li Q (2007) Video quality assessment using a statistical model of human visual speed perception. J Opt Soc Am A Opt Image Sci Vis 24(12):B61–B69
Wang Z, Lu L, Bovik AC (2004) Video quality assessment based on structural distortion measurement. Signal Process Image Commun 19(2):121–132
Wang Z, Simoncelli E, Bovik A, Matthews M (2003) Multiscale structural similarity for image quality assessment. In: IEEE asilomar conference on signals, systems and computers, pp 1398–1402
Wang Z, Simoncelli EP (2005) Translation insensitive image similarity in complex wavelet domain. In: IEEE international conference on acoustics, speech, and signal processing, pp 573–576
Watson AB (1987) The cortex transform: rapid computation of simulated neural images. Comput Vis Graph Image Process 39(3):311–327
Watson AB (ed) (1993) Digital images and human vision. The MIT
Watson AB, Hu J, McGowan JF III (2001) Digital video quality metric based on human vision. J Electron Imaging 10(1):20–29
Winkler S (1999) Perceptual distortion metric for digital color video. In: Proc. SPIE human vision and electronic imaging, vol 3644, no 1. San Jose, CA, pp 175–184
Winkler S (2005) Digital video quality. Wiley, New York
Zielinski SK, Rumsey F, Kassier R, Bech S (2005) Development and initial validation of a multichannel audio quality expert system. J Audio Eng Soc 53(1/2):4–21
Zwicker E (1961) Subdivision of the audible frequency range into critical bands (frequenzgruppen). J Acoust Soc Am 33(2):248–248
Zwicker E, Scharf B (1965) A model of loudness summation. Psychol Rev 72(1):3–26
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Seshadrinathan, K., Bovik, A.C. Automatic prediction of perceptual quality of multimedia signals—a survey. Multimed Tools Appl 51, 163–186 (2011). https://doi.org/10.1007/s11042-010-0625-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0625-9