Advertisement

Informational Masking in Speech Recognition

  • Gerald KiddJr.Email author
  • H. Steven Colburn
Chapter
Part of the Springer Handbook of Auditory Research book series (SHAR, volume 60)

Abstract

Solving the “cocktail party problem” depends on segregating, selecting, and comprehending the message of one specific talker among competing talkers. This chapter reviews the history of study of speech-on-speech (SOS) masking, highlighting the major ideas influencing the development of theories that have been proposed to account for SOS masking. Much of the early work focused on the role of spectrotemporal overlap of sounds, and the concomitant competition for representation in the auditory nervous system, as the primary cause of masking (termed energetic masking). However, there were some early indications—confirmed and extended in later studies—of the critical role played by central factors such as attention, memory, and linguistic processing. The difficulties related to these factors are grouped together and referred to as informational masking. The influence of methodological issues—in particular the need for a means of designating the target source in SOS masking experiments—is emphasized as contributing to the discrepancies in the findings and conclusions that frequent the history of study of this topic. Although the modeling of informational masking for the case of SOS masking has yet to be developed to any great extent, a long history of modeling binaural release from energetic masking has led to the application/adaptation of binaural models to the cocktail party problem. These models can predict some, but not all, of the factors that contribute to solving this problem. Some of these models, and their inherent limitations, are reviewed briefly here.

Keywords

Adverse listening conditions Auditory masking Auditory scene analysis Binaural models Cocktail party problem Energetic masking Informational masking Speech comprehension Speech in noise Speech perception 

Notes

Acknowledgements

The authors are indebted to Christine Mason for her comments on this chapter and for her assistance with its preparation. Thanks also to Elin Roverud and Jing Mi for providing comments on an earlier version and to the members of the Psychoacoustics Laboratory, Sargent College graduate seminar SLH 810, and Binaural Group for many insightful discussions of these topics. We are also grateful to those authors who generously allowed their figures to be reprinted here and acknowledge the support of the National Institutes of Health/National Institute on Deafness and Other Communication Disorders and Air Force Office of Scientific Research for portions of the research described here.

Compliance with Ethics Requirements

Gerald Kidd, Jr. declares that he has no conflict of interest.

H. Steven Colburn declares that he has no conflict of interest.

References

  1. ANSI (American National Standards Institute). (1997). American National Standard: Methods for calculation of the speech intelligibility index. Melville, NY: Acoustical Society of America.Google Scholar
  2. Arbogast, T. L., & Kidd, G., Jr. (2000). Evidence for spatial tuning in informational masking using the probe-signal method. The Journal of the Acoustical Society of America, 108(4), 1803–1810.CrossRefPubMedGoogle Scholar
  3. Arbogast, T. L., Mason, C. R., & Kidd, G., Jr. (2002). The effect of spatial separation on informational and energetic masking of speech. The Journal of the Acoustical Society of America, 112(5), 2086–2098.CrossRefPubMedGoogle Scholar
  4. Başkent, D. & Gaudrain, E. (2016). Musician advantage for speech-on-speech perception. The Journal of the Acoustical Society of America, 139(3), EL51–EL56.Google Scholar
  5. Beranek, L. (1947). Design of speech communication systems. Proceedings of the Institute of Radio Engineers, 35(9), 880–890.Google Scholar
  6. Best, V., Marrone, N., Mason, C. R., & Kidd, G., Jr. (2012). The influence of non-spatial factors on measures of spatial release from masking. The Journal of the Acoustical Society of America, 131(4), 3103–3110.CrossRefPubMedPubMedCentralGoogle Scholar
  7. Best, V., Mason, C. R., Kidd, G. Jr., Iyer, N., & Brungart, D. S. (2015). Better ear glimpsing efficiency in hearing-impaired listeners. The Journal of the Acoustical Society of America, 137(2), EL213–EL219.Google Scholar
  8. Best, V., Mason, C. R., & Kidd, G., Jr. (2011). Spatial release from masking as a function of the temporal overlap of competing maskers. The Journal of the Acoustical Society of America, 129(3), 1616–1625.CrossRefPubMedPubMedCentralGoogle Scholar
  9. Best, V., Ozmeral, E. J., & Shinn-Cunningham, B. G. (2007). Visually-guided attention enhances target identification in a complex auditory scene. The Journal of the Association for Research in Otolaryngology, 8, 294–304.CrossRefPubMedGoogle Scholar
  10. Beutelmann, R., Brand, T., & Kollmeier, B. (2009). Prediction of binaural speech intelligibility with frequency-dependent interaural phase differences. The Journal of the Acoustical Society of America, 126(3), 1359–1368.CrossRefPubMedGoogle Scholar
  11. Beutelmann, R., Brand, T., & Kollmeier, B. (2010). Revision, extension, and evaluation of a binaural speech intelligibility model. The Journal of the Acoustical Society of America, 127(4), 2479–2497.CrossRefPubMedGoogle Scholar
  12. Broadbent, D. E. (1952a). Listening to one of two synchronous messages. The Journal of Experimental Psychology, 44(1), 51–55.CrossRefPubMedGoogle Scholar
  13. Broadbent, D. E. (1952b). Failures of attention in selective listening. The Journal of Experimental Psychology, 44(6), 428–433.CrossRefPubMedGoogle Scholar
  14. Broadbent, D. E. (1958). Perception and communication. Oxford: Pergamon Press.CrossRefGoogle Scholar
  15. Bronkhorst, A. W. (2015). The cocktail-party problem revisited: Early processing and selection of multi-talker speech. Attention, Perception, & Psychophysics, 77(5), 1465–1487.Google Scholar
  16. Brouwer, S., Van Engen, K., Calandruccio, L., & Bradlow, A. R. (2012). Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content. The Journal of the Acoustical Society of America, 131(2), 1449–1464.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Brungart, D. S. (2001). Informational and energetic masking effects in the perception of two simultaneous talkers. The Journal of the Acoustical Society of America, 109(3), 1101–1109.CrossRefPubMedGoogle Scholar
  18. Brungart, D. S., Chang, P. S., Simpson, B. D., & Wang, D. (2006). Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. The Journal of the Acoustical Society of America, 120(6), 4007–4018.CrossRefPubMedGoogle Scholar
  19. Brungart, D. S., Chang, P. S., Simpson, B. D., & Wang, D. (2009). Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers. The Journal of the Acoustical Society of America, 125(6), 4006–4022.CrossRefPubMedGoogle Scholar
  20. Brungart, D. S., & Iyer, N. (2012). Better-ear glimpsing efficiency with symmetrically-placed interfering talkers. The Journal of the Acoustical Society of America, 132(4), 545–2556.CrossRefGoogle Scholar
  21. Brungart, D. S., & Simpson, B. D. (2004). Within-ear and across-ear interference in a dichotic cocktail party listening task: Effects of masker uncertainty. The Journal of the Acoustical Society of America, 115(1), 301–310.CrossRefPubMedGoogle Scholar
  22. Buss, E., Grose, J., & Hall, J. W., III. (2016). Effect of response context and masker type on word recognition. The Journal of the Acoustical Society of America, 140(2), 968–977.CrossRefPubMedGoogle Scholar
  23. Calandruccio, L., Brouwer, S., Van Engen, K., Dhar, S., & Bradlow, A. (2013). Masking release due to linguistic and phonetic dissimilarity between the target and masker speech. American Journal of Audiology, 22(1), 157–164.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Calandruccio, L., Dhar, S., & Bradlow, A. R. (2010). Speech-on-speech masking with variable access to the linguistic content of the masker speech. The Journal of the Acoustical Society of America, 128(2), 860–869.CrossRefPubMedPubMedCentralGoogle Scholar
  25. Calandruccio, L., Leibold, L. J., & Buss, E. (2016). Linguistic masking release in school-age children and adults. American Journal of Audiology, 25, 34–40.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Carhart, R., Tillman, T. W., & Greetis, E. S. (1969a). Release from multiple maskers: Effects of interaural time disparities. The Journal of the Acoustical Society of America, 45(2), 411–418.CrossRefPubMedGoogle Scholar
  27. Carhart, R., Tillman, T. W., & Greetis, E. S. (1969b). Perceptual masking in multiple sound backgrounds. The Journal of the Acoustical Society of America, 45(3), 694–703.CrossRefPubMedGoogle Scholar
  28. Carlile, S. (2014). Active listening: Speech intelligibility in noisy environments. Acoustics Australia, 42, 98–104.Google Scholar
  29. Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.CrossRefGoogle Scholar
  30. Clayton, K. K., Swaminathan, J., Yazdanbakhsh, A., Patel, A. D., & Kidd, G., Jr. (2016). Exectutive function, visual attention and the cocktail party problem in musicians and non-musicians. PLoS ONE, 11(7), e0157638.CrossRefPubMedPubMedCentralGoogle Scholar
  31. Colburn, H. S., & Durlach, N. I. (1978). Models of binaural interaction. In E. Carterette & M. Friedman (Eds.), Handbook of perception: Hearing (Vol. 4, pp. 467–518). New York: Academic Press.Google Scholar
  32. Cooke, M., Lecumberri, M. G., & Barker, J. (2008). The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception. The Journal of the Acoustical Society of America, 123(1), 414–427.CrossRefPubMedGoogle Scholar
  33. Dirks, D. D., & Bower, D. R. (1969). Masking effects of speech competing messages. Journal of Speech and Hearing Research, 12(2), 229–245.CrossRefPubMedGoogle Scholar
  34. Durlach, N. I. (1963). Equalization and cancellation theory of binaural masking-level differences. The Journal of the Acoustical Society of America, 35(8), 1206–1218.CrossRefGoogle Scholar
  35. Egan, J. P., & Wiener, F. M. (1946). On the intelligibility of bands of speech in noise. The Journal of the Acoustical Society of America, 18(2), 435–441.CrossRefGoogle Scholar
  36. Ezzatian, P., Avivi, M., & Schneider, B. A. (2010). Do nonnative listeners benefit as much as native listeners from spatial cues that release speech from masking? Speech Communication, 52(11), 919–929.CrossRefGoogle Scholar
  37. Fletcher, H. (1940). Auditory patterns. Review of Modern Physics, 12(1), 47–65.CrossRefGoogle Scholar
  38. French, N. R., & Steinberg, J. C. (1947). Factors governing the intelligibility of speech sounds. The Journal of the Acoustical Society of America, 19(1), 90–119.CrossRefGoogle Scholar
  39. Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2001). Spatial release from informational masking in speech recognition. The Journal of the Acoustical Society of America, 109(5), 2112–2122.CrossRefPubMedGoogle Scholar
  40. Freyman, R. L., Balakrishnan, U., & Helfer, K. S. (2004). Effect of number of masker talkers and auditory priming on informational masking in speech recognition. The Journal of the Acoustical Society of America, 115(5), 2246–2256.CrossRefPubMedGoogle Scholar
  41. Freyman, R. L., Helfer, K. S., & Balakrishnan, U. (2007). Variability and uncertainty in masking by competing speech. The Journal of the Acoustical Society of America, 121(2), 1040–1046.CrossRefPubMedGoogle Scholar
  42. Freyman, R. L., Helfer, K. S., McCall, D. D., & Clifton, R. K. (1999). The role of perceived spatial separation in the unmasking of speech. The Journal of the Acoustical Society of America, 106(6), 3578–3588.CrossRefPubMedGoogle Scholar
  43. Helfer, K. S., & Jesse, A. (2015). Lexical influences on competing speech perception in younger, middle-aged, and older adults. The Journal of the Acoustical Society of America, 138(1), 363–376.CrossRefPubMedPubMedCentralGoogle Scholar
  44. Hirsh, I. J. (1948). The influence of interaural phase on interaural summation and inhibition. The Journal of the Acoustical Society of America, 20(4), 536–544.CrossRefGoogle Scholar
  45. Hygge, S., Ronnberg, J., Larsby, B., & Arlinger, S. (1992). ‘Normal hearing and hearing-impaired subjects’ ability to just follow conversation in competing speech, reversed speech, and noise backgrounds. Journal of Speech and Hearing Research, 35(1), 208–215.CrossRefPubMedGoogle Scholar
  46. Iyer, N., Brungart, D. S., & Simpson, B. D. (2010). Effects of target-masker contextual similarity on the multimasker penalty in a three-talker diotic listening task. The Journal of the Acoustical Society of America, 128(5), 2998–3010.CrossRefPubMedGoogle Scholar
  47. Jeffress, L. A. (1948). A place theory of sound localization. Journal of Comparative and Physiological Psychology, 41(1), 35–39.CrossRefPubMedGoogle Scholar
  48. Jeffress, L. A., Blodgett, H. C., Sandel, T. T., & Wood, C. L. III. (1956). Masking of tonal signals. The Journal of the Acoustical Society of America, 28(3), 416–426.CrossRefGoogle Scholar
  49. Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., et al. (2013). Swinging at a cocktail party: Voice familiarity aids speech perception in the presence of a competing voice. Psychological Science, 24, 1995–2004.CrossRefPubMedGoogle Scholar
  50. Kalikow, D. N., Stevens, K. N., & Elliot, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. The Journal of the Acoustical Society of America, 61(5), 1337–1351.CrossRefPubMedGoogle Scholar
  51. Kellogg, E. W. (1939). Reversed speech. The Journal of the Acoustical Society of America, 10(4), 324–326.CrossRefGoogle Scholar
  52. Kidd, G., Jr., Arbogast, T. L., Mason, C. R., & Gallun, F. J. (2005). The advantage of knowing where to listen. The Journal of the Acoustical Society of America, 118(6), 3804–3815.CrossRefPubMedGoogle Scholar
  53. Kidd, G., Jr., Best, V., & Mason, C. R. (2008a). Listening to every other word: Examining the strength of linkage variables in forming streams of speech. The Journal of the Acoustical Society of America, 124(6), 3793–3802.CrossRefPubMedPubMedCentralGoogle Scholar
  54. Kidd, G., Jr., Mason, C. R., & Best, V. (2014). The role of syntax in maintaining the integrity of streams of speech. The Journal of the Acoustical Society of America, 135(2), 766–777.CrossRefPubMedPubMedCentralGoogle Scholar
  55. Kidd, G., Jr., Mason, C. R., Best, V., & Marrone, N. L. (2010). Stimulus factors influencing spatial release from speech on speech masking. The Journal of the Acoustical Society of America, 128(4), 1965–1978.CrossRefPubMedPubMedCentralGoogle Scholar
  56. Kidd, G., Jr., Mason, C. R., Richards, V. M., Gallun, F. J., & Durlach, N. I. (2008b). Informational masking. In W. A. Yost, A. N. Popper, & R. R. Fay (Eds.), Auditory perception of sound sources (pp. 143–190). New York: Springer Science + Business Media.Google Scholar
  57. Kidd, G., Jr., Mason, C. R., Swaminathan, J., Roverud, E., et al. (2016). Determining the energetic and informational components of speech-on-speech masking. The Journal of the Acoustical Society of America, 140(1), 132–144.CrossRefPubMedGoogle Scholar
  58. Levitt, H., & Rabiner, L. R. (1967a). Binaural release from masking for speech and gain in intelligibility. The Journal of the Acoustical Society of America, 42(3), 601–608.CrossRefPubMedGoogle Scholar
  59. Levitt, H., & Rabiner, L. R. (1967b). Predicting binaural gain in intelligibility and release from masking for speech. The Journal of the Acoustical Society of America, 42(4), 820–829.CrossRefPubMedGoogle Scholar
  60. Licklider, J. C. R. (1948). The influence of interaural phase relations upon the masking of speech by white noise. The Journal of the Acoustical Society of America, 20(2), 150–159.CrossRefGoogle Scholar
  61. Marrone, N. L., Mason, C. R., & Kidd, G., Jr. (2008). Tuning in the spatial dimension: Evidence from a masked speech identification task. The Journal of the Acoustical Society of America, 124(2), 1146–1158.CrossRefPubMedPubMedCentralGoogle Scholar
  62. Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: A review. Language and Cognitive Processes, 27(7–8), 953–978.CrossRefGoogle Scholar
  63. Miller, G. A. (1947). The masking of speech. Psychological Bulletin, 44(2), 105–129.CrossRefPubMedGoogle Scholar
  64. Newman, R. (2009). Infants’ listening in multitalker environments: Effect of the number of background talkers. Attention, Perception, & Psychophysics, 71(4), 822–836.CrossRefGoogle Scholar
  65. Newman, R. S., Morini, G., Ahsan, F., & Kidd, G., Jr. (2015). Linguistically-based informational masking in preschool children. The Journal of the Acoustical Society of America, 138(1), EL93–EL98.Google Scholar
  66. Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2005). Release from informational masking by time reversal of native and non-native interfering speech. The Journal of the Acoustical Society of America, 118(3), 1274–1277.CrossRefPubMedGoogle Scholar
  67. Rhebergen, K. S., Versfeld, N. J., & Dreschler, W. A. (2006). Extended speech intelligibility index for the prediction of the speech reception threshold in fluctuating noise. The Journal of the Acoustical Society of America, 120(6), 3988–3997.CrossRefPubMedGoogle Scholar
  68. Samson, F., & Johnsrude, I. S. (2016). Effects of a consistent target or masker voice on target speech intelligibility in two- and three-talker mixtures. The Journal of the Acoustical Society of America, 139(3), 1037–1046.CrossRefPubMedGoogle Scholar
  69. Schubert, E. D., & Schultz, M. C. (1962). Some aspects of binaural signal selection. The Journal of the Acoustical Society of America, 34(6), 844–849.CrossRefGoogle Scholar
  70. Schubotz, W., Brand, T., Kollmeier, B., & Ewert, S. D. (2016). Monaural speech intelligibility and detection in maskers with varying amounts of spectro-temporal speech features. The Journal of the Acoustical Society of America, 140(1), 524–540.CrossRefPubMedGoogle Scholar
  71. Speaks, C., & Jerger, J. (1965). Method for measurement of speech identification. Journal of Speech and Hearing Research, 8(2), 185–194.CrossRefGoogle Scholar
  72. Swaminathan, J., Mason, C. R., Streeter, T. M., Best, V. A., et al. (2015). Musical training and the cocktail party problem. Scientific Reports, 5, 1–10, No. 11628.Google Scholar
  73. Uslar, V. N., Carroll, R., Hanke, M., Hamann, C., et al. (2013). Development and evaluation of a linguistically and audiologically controlled sentence intelligibility test. The Journal of the Acoustical Society of America, 134(4), 3039–3056.CrossRefPubMedGoogle Scholar
  74. Van Engen, K. J., & Bradlow, A. R. (2007). Sentence recognition in native- and foreign-language multi-talker background noise. The Journal of the Acoustical Society of America, 121(1), 519–526.CrossRefPubMedPubMedCentralGoogle Scholar
  75. Wan, R., Durlach, N. I., & Colburn, H. S. (2010). Application of an extended equalization-cancellation model to speech intelligibility with spatially distributed maskers. The Journal of the Acoustical Society of America, 128(6), 3678–3690.CrossRefPubMedPubMedCentralGoogle Scholar
  76. Wan, R., Durlach, N. I., & Colburn, H. S. (2014). Application of a short-time version of the equalization-cancellation model to speech intelligibility experiments. The Journal of the Acoustical Society of America, 136(2), 768–776.CrossRefPubMedPubMedCentralGoogle Scholar
  77. Watson, C. S. (2005). Some comments on informational masking. Acta Acustica united with Acustica, 91(3), 502–512.Google Scholar
  78. Webster, F. A. (1951). The influence of interaural phase on masked thresholds. I: The role of interaural time-deviation. The Journal of the Acoustical Society of America, 23(4), 452–462.CrossRefGoogle Scholar
  79. Webster, J. C. (1983). Applied research on competing messages. In J. V. Tobias & E. D. Schubert (Eds.), Hearing research and theory (Vol. 2, pp. 93–123). New York: Academic Press.Google Scholar
  80. Zurek, P. M. (1993). Binaural advantages and directional effects in speech intelligibility. In G. A. Studebaker & I. Hochberg (Eds.), Acoustical factors affecting hearing aid performance (pp. 255–276). Boston: Allyn and Bacon.Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Speech, Language and Hearing Sciences, Hearing Research CenterBoston UniversityBostonUSA
  2. 2.Department of Biomedical Engineering, Hearing Research CenterBoston UniversityBostonUSA

Personalised recommendations