Informational Masking in Speech Recognition
Solving the “cocktail party problem” depends on segregating, selecting, and comprehending the message of one specific talker among competing talkers. This chapter reviews the history of study of speech-on-speech (SOS) masking, highlighting the major ideas influencing the development of theories that have been proposed to account for SOS masking. Much of the early work focused on the role of spectrotemporal overlap of sounds, and the concomitant competition for representation in the auditory nervous system, as the primary cause of masking (termed energetic masking). However, there were some early indications—confirmed and extended in later studies—of the critical role played by central factors such as attention, memory, and linguistic processing. The difficulties related to these factors are grouped together and referred to as informational masking. The influence of methodological issues—in particular the need for a means of designating the target source in SOS masking experiments—is emphasized as contributing to the discrepancies in the findings and conclusions that frequent the history of study of this topic. Although the modeling of informational masking for the case of SOS masking has yet to be developed to any great extent, a long history of modeling binaural release from energetic masking has led to the application/adaptation of binaural models to the cocktail party problem. These models can predict some, but not all, of the factors that contribute to solving this problem. Some of these models, and their inherent limitations, are reviewed briefly here.
KeywordsAdverse listening conditions Auditory masking Auditory scene analysis Binaural models Cocktail party problem Energetic masking Informational masking Speech comprehension Speech in noise Speech perception
The authors are indebted to Christine Mason for her comments on this chapter and for her assistance with its preparation. Thanks also to Elin Roverud and Jing Mi for providing comments on an earlier version and to the members of the Psychoacoustics Laboratory, Sargent College graduate seminar SLH 810, and Binaural Group for many insightful discussions of these topics. We are also grateful to those authors who generously allowed their figures to be reprinted here and acknowledge the support of the National Institutes of Health/National Institute on Deafness and Other Communication Disorders and Air Force Office of Scientific Research for portions of the research described here.
Compliance with Ethics Requirements
Gerald Kidd, Jr. declares that he has no conflict of interest.
H. Steven Colburn declares that he has no conflict of interest.
- ANSI (American National Standards Institute). (1997). American National Standard: Methods for calculation of the speech intelligibility index. Melville, NY: Acoustical Society of America.Google Scholar
- Başkent, D. & Gaudrain, E. (2016). Musician advantage for speech-on-speech perception. The Journal of the Acoustical Society of America, 139(3), EL51–EL56.Google Scholar
- Beranek, L. (1947). Design of speech communication systems. Proceedings of the Institute of Radio Engineers, 35(9), 880–890.Google Scholar
- Best, V., Mason, C. R., Kidd, G. Jr., Iyer, N., & Brungart, D. S. (2015). Better ear glimpsing efficiency in hearing-impaired listeners. The Journal of the Acoustical Society of America, 137(2), EL213–EL219.Google Scholar
- Bronkhorst, A. W. (2015). The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics, 77(5), 1465–1487.Google Scholar
- Brouwer, S., Van Engen, K., Calandruccio, L., & Bradlow, A. R. (2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. The Journal of the Acoustical Society of America, 131(2), 1449–1464.CrossRefPubMedPubMedCentralGoogle Scholar
- Carlile, S. (2014). Active listening: Speech intelligibility in noisy environments. Acoustics Australia, 42, 98–104.Google Scholar
- Colburn, H. S., & Durlach, N. I. (1978). Models of binaural interaction. In E. Carterette & M. Friedman (Eds.), Handbook of perception: Hearing (Vol. 4, pp. 467–518). New York: Academic Press.Google Scholar
- Kidd, G., Jr., Mason, C. R., Richards, V. M., Gallun, F. J., & Durlach, N. I. (2008b). Informational masking. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 143–190). New York: Springer Science + Business Media.Google Scholar
- Newman, R. S., Morini, G., Ahsan, F., & Kidd, G., Jr. (2015). Linguistically-based informational masking in preschool children. The Journal of the Acoustical Society of America, 138(1), EL93–EL98.Google Scholar
- Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V. A., et al. (2015). Musical training and the cocktail party problem. Scientific Reports, 5, 1–10, No. 11628.Google Scholar
- Watson, C. S. (2005). Some comments on informational masking. Acta Acustica united with Acustica, 91(3), 502–512.Google Scholar
- Webster, J. C. (1983). Applied research on competing messages. In J. V. Tobias & E. D. Schubert (Eds.), Hearing research and theory (Vol. 2, pp. 93–123). New York: Academic Press.Google Scholar
- Zurek, P. M. (1993). Binaural advantages and directional effects in speech intelligibility. In G. A. Studebaker & I. Hochberg (Eds.), Acoustical factors affecting hearing aid performance (pp. 255–276). Boston: Allyn and Bacon.Google Scholar