Advertisement

Toward a Model of Auditory-Visual Speech Intelligibility

  • Ken W. GrantEmail author
  • Joshua G. W. Bernstein
Part of the Springer Handbook of Auditory Research book series (SHAR, volume 68)

Abstract

A significant proportion of speech communication occurs when speakers and listeners are within face-to-face proximity of one other. In noisy and reverberant environments with multiple sound sources, auditory-visual (AV) speech communication takes on increased importance because it offers the best chance for successful communication. This chapter reviews AV processing for speech understanding by normal-hearing individuals. Auditory, visual, and AV factors that influence intelligibility, such as the speech spectral regions that are most important for AV speech recognition, complementary and redundant auditory and visual speech information, AV integration efficiency, the time window for auditory (across spectrum) and AV (cross-modality) integration, and the modulation coherence between auditory and visual speech signals are each discussed. The knowledge gained from understanding the benefits and limitations of visual speech information as it applies to AV speech perception is used to propose a signal-based model of AV speech intelligibility. It is hoped that the development and refinement of quantitative models of AV speech intelligibility will increase our understanding of the multimodal processes that function every day to aid speech communication, as well guide advances in future generation hearing aids and cochlear implants for individuals with sensorineural hearing loss.

Keywords

Articulation index Auditory-visual coherence Hearing loss Modeling Place of articulation Spectrotemporal modulation index Speech envelope Speech intelligibility index Speechreading Speech transmission index Temporal asynchrony Temporal window of integration Voicing 

References

  1. American National Standards Institute (ANSI). (1969). American National Standard Methods for the calculation of the articulation index. ANSI S3.5-1969. New York: American National Standards Institute.Google Scholar
  2. American National Standards Institute (ANSI). (1997). American National Standard Methods for calculation of the speech intelligibility index. ANSI S3.5–1997. New York: American National Standards Institute.Google Scholar
  3. Bernstein, J. G. W., & Grant, K. W. (2009). Audio and audiovisual speech intelligibility in fluctuating maskers by normal-hearing and hearing-impaired listeners. The Journal of the Acoustical Society of America, 125, 3358–3372.Google Scholar
  4. Bernstein, J. G. W., Summers, V., Grassi, E., & Grant, K. W. (2013). Auditory models of suprathreshold distortion and speech intelligibility in persons with impaired hearing. Journal of the American Academy of Audiology, 24, 307–328.Google Scholar
  5. Berthommier, F. (2004). A phonetically neutral model of the low-level audio-visual interaction. Speech Communication, 44(1), 31–41.Google Scholar
  6. Braida, L. D. (1991). Crossmodal integration in the identification of consonant segments. Quarterly Journal of Experimental Psychology, 43, 647–677.Google Scholar
  7. Bruce, I. (2017). Physiologically based predictors of speech intelligibility. Acoustics Today, 13(1), 28–35.Google Scholar
  8. Byrne, D., Dillon, H., Ching, T., Katsch, R., & Keidser, G. (2001). NAL-NL1 procedure for fitting nonlinear hearing aids: Characteristics and comparisons with other procedures. Journal of the American Academy of Audiology, 31, 37–51.Google Scholar
  9. Drullman, R., & Smoorenburg, G. F. (1997). Audio-visual perception of compressed speech by profoundly hearing-impaired subjects. Audiology, 36(3), 165–177.Google Scholar
  10. Elhilali, M., Chi, T., & Shamma, S. A. (2003). A spectro-temporal modulation index (STMI) for assessment of speech intelligibility. Speech Communication, 41(2), 331–348.Google Scholar
  11. Erber, N. (1972). Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. Journal of Speech, Language, and Hearing Research, 15(2), 413–422.Google Scholar
  12. Fletcher, H. (1953). Speech and hearing in communication. New York: Van Nostrand.Google Scholar
  13. Fletcher, H., & Gault, R. H. (1950). The perception of speech and its relation to telephony. The Journal of the Acoustical Society of America, 22, 89–150.Google Scholar
  14. French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19, 90–119.Google Scholar
  15. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., et al. (1990). DARPA, TIMIT acoustic-phonetic continuous speech corpus CD-ROM. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.Google Scholar
  16. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., & Pallett, D. S. (1993). DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. Gaithersburg, MD: National Institute of Standards and Technology, US Department of Commerce.Google Scholar
  17. Girin, L., Schwartz, J. L., & Feng, G. (2001). Audio-visual enhancement of speech in noise. The Journal of the Acoustical Society of America, 109(6), 3007–3020.Google Scholar
  18. Gordon, P. C. (1997). Coherence masking protection in speech sounds: The role of formant synchrony. Perception & Psychophysics, 59, 232–242.Google Scholar
  19. Gordon, P. C. (2000). Masking protection in the perception of auditory objects. Speech Communication, 30, 197–206.Google Scholar
  20. Grant, K. W. (2001). The effect of speechreading on masked detection thresholds for filtered speech. The Journal of the Acoustical Society of America, 109, 2272–2275.Google Scholar
  21. Grant, K. W., Ardell, L. H., Kuhl, P. K., & Sparks, D. W. (1985). The contribution of fundamental frequency, amplitude envelope, and voicing duration cues to speechreading in normal-hearing subjects. The Journal of the Acoustical Society of America, 77, 671–677.Google Scholar
  22. Grant, K. W., Bernstein, J. G. W., & Grassi, E. (2008). Modeling auditory and auditory-visual speech intelligibility: Challenges and possible solutions. Proceedings of the International Symposium on Auditory and Audiological Research, 1, 47–58.Google Scholar
  23. Grant, K. W., Bernstein, J. G. W., & Summers, V. (2013). Predicting speech intelligibility by individual hearing-impaired listeners: The path forward. Journal of the American Academy of Audiology, 24, 329–336.Google Scholar
  24. Grant, K. W., & Braida, L. D. (1991). Evaluating the articulation index for audiovisual input. The Journal of the Acoustical Society of America, 89, 2952–2960.Google Scholar
  25. Grant, K. W., Greenberg, S., Poeppel, D., & van Wassenhove, V. (2004). Effects of spectro-temporal asynchrony in auditory and auditory-visual speech processing. Seminars in Hearing, 25, 241–255.Google Scholar
  26. Grant, K. W., & Seitz, P. F. (2000). The use of visible speech cues for improving auditory detection of spoken sentences. The Journal of the Acoustical Society of America, 108, 1197–1208.Google Scholar
  27. Grant, K. W., Tufts, J. B., & Greenberg, S. (2007). Integration efficiency for speech perception within and across sensory modalities. The Journal of the Acoustical Society of America, 121, 1164–1176.Google Scholar
  28. Grant, K. W., & Walden, B. E. (1996). Evaluating the articulation index for auditory-visual consonant recognition. The Journal of the Acoustical Society of America, 100, 2415–2424.Google Scholar
  29. Grant, K. W., Walden, B. E., & Seitz, P. F. (1998). Auditory-visual speech recognition by hearing-impaired subjects: Consonant recognition, sentence recognition, and auditory-visual integration. The Journal of the Acoustical Society of America, 103, 2677–2690.Google Scholar
  30. Hall, J. W., Haggard, M. P., & Fernandes, M. A. (1984). Detection in noise by spectro-temporal pattern analysis. The Journal of the Acoustical Society of America, 76, 50–56.Google Scholar
  31. Hardick, E. J., Oyer, H. J., & Irion, P. E. (1970). Lipreading performance as related to measurements of vision. Journal of Speech and Hearing Research, 13, 92–100.Google Scholar
  32. Helfer, K. S., & Freyman, R. L. (2005). The role of visual speech cues in reducing energetic and informational masking. The Journal of the Acoustical Society of America, 117(2), 842–849.Google Scholar
  33. Hickson, L., Hollins, M., Lind, C., Worrall, L. E., & Lovie-Kitchin, J. (2004). Auditory-visual speech perception in older people: The effect of visual acuity. Australian and New Zealand Journal of Audiology, 26, 3–11.Google Scholar
  34. Kewley-Port, D. (1983). Time-varying features as correlates of place of articulation in stop consonants. The Journal of the Acoustical Society of America, 73(1), 322–335.Google Scholar
  35. Killion, M., Schulein, R., Christensen, L., Fabry, D., Revit, L., Niquette, P., & Chung, K. (1998). Real-world performance of an ITE directional microphone. The Hearing Journal, 51, 24–39.Google Scholar
  36. Legault, I., Gagné, J. P., Rhoualem, W., & Anderson-Gosselin, P. (2010). The effects of blurred vision on auditory-visual speech perception in younger and older adults. International Journal of Audiology, 49(12), 904–911.Google Scholar
  37. Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967). Perception of the speech code. Psychological Review, 74(6), 431–461.Google Scholar
  38. Massaro, D. W. (1987). Speech perception by ear and eye: A paradigm for psychological inquiry. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  39. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.Google Scholar
  40. Massaro, D. W., Cohen, M. M., & Smeele, P. M. (1996). Perception of asynchronous and conflicting visual and auditory speech. The Journal of the Acoustical Society of America, 100(3), 1777–1786.Google Scholar
  41. McGrath, M., & Summerfield, Q. (1985). Intermodal timing relations and audio-visual speech recognition by normal-hearing adults. The Journal of the Acoustical Society of America, 77(2), 678–685.Google Scholar
  42. Middelweerd, M. J., & Plomp, R. (1987). The effect of speechreading on the speech-reception threshold of sentences in noise. The Journal of the Acoustical Society of America, 82(6), 2145–2147.Google Scholar
  43. Miller, G. A., & Nicely, P. E. (1955). An analysis of perceptual confusions among some English consonants. The Journal of the Acoustical Society of America, 27(2), 338–352.Google Scholar
  44. Poeppel, D. (2003). The analysis of speech in different temporal integration windows: Cerebral lateralization as ‘asymmetric sampling in time. Speech Communication, 41(1), 245–255.Google Scholar
  45. Reetz, H., & Jongman, A. (2011). Phonetics: Transcription, production, acoustics, and perception. Chichester, West Sussex: Wiley-Blackwell.Google Scholar
  46. Rhebergen, K. S., & Versfeld, N. J. (2005). A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. The Journal of the Acoustical Society of America, 117(4), 2181–2192.Google Scholar
  47. Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. The Journal of the Acoustical Society of America, 120(6), 3988–3997.Google Scholar
  48. Rosen, S. M., Fourcin, A. J., & Moore, B. C. J. (1981). Voice pitch as an aid to lipreading. Nature, 291(5811), 150–152.Google Scholar
  49. Shahin, A. J., Shen, S., & Kerlin, J. R. (2017). Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech. Language, Cognition and Neuroscience, 32(9), 1102–1118.Google Scholar
  50. Shoop, C., & Binnie, C. A. (1979). The effects of age upon the visual perception of speech. Scandinavian Audiology, 8(1), 3–8.Google Scholar
  51. Sommers, M. S., Tye-Murray, N., & Spehar, B. (2005). Auditory-visual speech perception and auditory-visual enhancement in normal-hearing younger and older adults. Ear and Hearing, 26(3), 263–275.Google Scholar
  52. Steeneken, H. J., & Houtgast, T. (2002). Validation of the revised STIr method. Speech Communication, 38(3), 413–425.Google Scholar
  53. Studdert-Kennedy, M. (1974). The perception of speech. Current Trends in Linguistics, 12, 2349–2385.Google Scholar
  54. Sumby, W. H., & Pollack, I. (1954). Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212–215.Google Scholar
  55. Summerfield, Q. (1987). Some preliminaries to a comprehensive account of audio-visual speech perception. In B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lip-reading (pp. 3–52). Hillsdale NJ: Lawrence Erlbaum Associates.Google Scholar
  56. Summerfield, Q. (1992). Lipreading and audio-visual speech perception. Philosophical Transactions of the Royal Society of London B, Biological Sciences, 335(1273), 71–78.Google Scholar
  57. Tye-Murray, N., Sommers, M. S., & Spehar, B. (2007). Audiovisual integration and lipreading abilities of older adults with normal and impaired hearing. Ear and Hearing, 28(5), 656–668.Google Scholar
  58. van Wassenhove, V., Grant, K. W., & Poeppel, D. (2005). Visual speech speeds up the neural processing of auditory speech. Proceedings of the National Academy of Sciences of the United States of America, 102, 1181–1186.Google Scholar
  59. van Wassenhove, V., Grant, K. W., & Poeppel, D. (2007). Temporal window of integration in auditory-visual speech perception. Neuropsychologia, 45, 598–607.Google Scholar
  60. Walden, B. E., Grant, K. W., & Cord, M. T. (2001). Effects of amplification and speechreading on consonant recognition by persons with impaired hearing. Ear and Hearing, 22(4), 333–341.Google Scholar
  61. Walden, B. E., Surr, R. K., Cord, M. T., & Dyrlund, O. (2004). Predicting hearing aid microphone preference in everyday listening. Journal of the American Academy of Audiology, 15(5), 365–396.Google Scholar
  62. Wu, Y. H., & Bentler, R. A. (2010). Impact of visual cues on directional benefit and preference: Part I—Laboratory tests. Ear and Hearing, 31(1), 22–34.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Military Audiology and Speech Pathology CenterWalter Reed National Military Medical CenterBethesdaUSA

Personalised recommendations