Factors Affecting Speech Reception in Background Noise with a Vocoder Implementation of the FAST Algorithm

  • Shaikat HossainEmail author
  • Raymond L. Goldsworthy
Research Article


Speech segregation in background noise remains a difficult task for individuals with hearing loss. Several signal processing strategies have been developed to improve the efficacy of hearing assistive technologies in complex listening environments. The present study measured speech reception thresholds in normal-hearing listeners attending to a vocoder based on the Fundamental Asynchronous Stimulus Timing algorithm (FAST: Smith et al. 2014), which triggers pulses based on the amplitudes of channel magnitudes in order to preserve envelope timing cues, with two different reconstruction bandwidths (narrowband and broadband) to control the degree of spectrotemporal resolution. Five types of background noise were used including same male talker, female talker, time-reversed male talker, time-reversed female talker, and speech-shaped noise to probe the contributions of different types of speech segregation cues and to elucidate how degradation affects speech reception across these conditions. Maskers were spatialized using head-related transfer functions in order to create co-located and spatially separated conditions. Results indicate that benefits arising from voicing and spatial cues can be preserved using the FAST algorithm but are reduced with a reduction in spectral resolution.


speech comprehension vocoder cocktail party cochlear implants FAST 


  1. Arbogast TL, Mason CR, Kidd G (2002) The effect of spatial separation on informational and energetic masking of speech. J Acoust Soc Am 112:2086–2098CrossRefGoogle Scholar
  2. Balakrishnan U, Freyman RL (2008) Speech detection in spatial and non-spatial speech maskers. J Acoust Soc Am 123:2680–2691CrossRefGoogle Scholar
  3. Başkent D, Gaudrain E (2016) Musician advantage for speech-on-speech perception. J Acoust Soc Am 139:EL51–EL56CrossRefGoogle Scholar
  4. Blauert J (1997) Spatial hearing: the psychophysics of human sound localization. MIT Press, CambridgeGoogle Scholar
  5. Bolia RS et al (2000) A speech corpus for multitalker communications research. J Acoust Soc Am 107:1065–1066CrossRefGoogle Scholar
  6. Brokx JPL, Nooteboom SG (1982) Intonation and the perceptual separation of simultaneous voices. J Phon 10:23–36Google Scholar
  7. Bronkhorst AW (2000) The cocktail party phenomenon: a review of research on speech intelligibility in multiple-talker conditions. Act Acust U Acust 86(1):117–128Google Scholar
  8. Brungart DS (2001a) Evaluation of speech intelligibility with the coordinate response measure. J Acoust Soc Am 109:2276–2279CrossRefGoogle Scholar
  9. Brungart DS (2001b) Informational and energetic masking effects in the perception of two simultaneous talkers. J Acoust Soc Am 109:1101–1109CrossRefGoogle Scholar
  10. Brungart D, Simpson B (2002) Within-ear and across-ear interference in a cocktail-party listening task. J Acoust Soc Am 112:2985–2995CrossRefGoogle Scholar
  11. Carlile S, Corkhill C (2015) Selective spatial attention modulates bottom-up informational masking of speech. Sci Rep 5(8662):1–7Google Scholar
  12. Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25(5):975–979CrossRefGoogle Scholar
  13. Churchill T et al (2014) Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners. J Acoust Soc Am 136:1246–1256CrossRefGoogle Scholar
  14. Cooke M (2006) A glimpsing model of speech perception in noise. J Acoust Soc Am 119(3):1562–1573CrossRefGoogle Scholar
  15. Darwin C, Hukin R (2000) Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J Acoust Soc Am 107:970–977CrossRefGoogle Scholar
  16. Darwin C, Brungart D, Simpson B (2003) Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers. J Acoust Soc Am 114:2913–2922CrossRefGoogle Scholar
  17. Dorman MF et al (1998) The recognition of sentences in noise by normal-hearing listeners using simulations of cochlear-implant signal processors with 6–20 channels. J Acoust Soc Am 104:3583–3585CrossRefGoogle Scholar
  18. Dubbelboer F, Houtgast T (2008) The concept of signal-to-noise ratio in the modulation domain and speech intelligibility. J Acoust Soc Am 124:3937–3946CrossRefGoogle Scholar
  19. Durlach NI, Mason CR, Kidd Jr. G, Arbogast TL, Colburn HS, Shinn-Cunningham B (2003) Note on informational masking. J Acoust Soc Am in pressGoogle Scholar
  20. Fitch WT, Giedd J (1999) Morphology and development of the human vocal tract: a study using magnetic resonance imaging. J Acoust Soc Am 106:1511–1522CrossRefGoogle Scholar
  21. Freyman RL et al (1999) The role of perceived spatial separation in the unmasking of speech. J Acoust Soc Am 106(6):3578–3588CrossRefGoogle Scholar
  22. Freyman R, Balakrishnan U, Helfer K (2001) Spatial release from informational masking in speech recognition. J Acoust Soc Am 109:2112–2122CrossRefGoogle Scholar
  23. Freyman RL, Balakrishnan U, Helfer KS (2008) Spatial release from masking with noise-vocoded speech. J Acoust Soc Am 124:1627–1637CrossRefGoogle Scholar
  24. Fuller CD et al (2014) Gender categorization is abnormal in cochlear implant users. J Assoc Res Otolaryngol 15:1037–1048CrossRefGoogle Scholar
  25. Gallun FJ, Mason CR, Kidd G (2005) Binaural release from informational masking in a speech recognition task. J Acoust Soc Am 118:1614–1625CrossRefGoogle Scholar
  26. Gaudrain E, Başkent D (2015) Factors limiting vocal-tract length discrimination in cochlear implant simulations. J Acoust Soc Am 137:1298–1308CrossRefGoogle Scholar
  27. Goldsworthy R (2015) Correlations between pitch and phoneme perception in cochlear implant users and their normal hearing peers. J Assoc Res Otolaryngol 16(6):797–809CrossRefGoogle Scholar
  28. Hillenbrand JM, Clark MJ (2009) The role of F0 and formant frequencies in distinguishing the voices of men and women. Atten Percept Psychophys 71(5), pp. 16Google Scholar
  29. Hirsh IJ (1948) The influence of interaural phase on interaural summation and inhibition. J Acoust Soc Am 20:536–544CrossRefGoogle Scholar
  30. Hirsh IJ (1950) The relation between localization and intelligibility. J Acoust Soc Am 22:196–200CrossRefGoogle Scholar
  31. van Hoesel RJ, Tyler RS (2003) Speech perception, localization, and lateralization with bilateral cochlear implants. J Acoust Soc Am 113:1617–1630CrossRefGoogle Scholar
  32. Jorgensen S, Ewert SD, Dau T (2013) A multi-resolution envelope-power based model for speech intelligibility. J Acoust Soc Am 134(1):436–446CrossRefGoogle Scholar
  33. Kan A, Litovsky R (2015) Binaural hearing with electrical stimulation. Hear Res 322:127–137. CrossRefGoogle Scholar
  34. Kates JM (2011) Spectro-temporal envelope changes caused by temporal fine structure modification. J Acoust Soc Am 129(6):3981–3990CrossRefGoogle Scholar
  35. Kidd G Jr et al. (2007) Informational masking. Springer handbook of auditory research 29: auditory perception of sound sources, edited by W. Yost (Springer, New York), pp. 143–190Google Scholar
  36. Kidd G Jr et al (1998) Release from masking due to spatial separation of sources in the identification of nonspeech auditory patterns. J Acoust Soc Am 104:422–431CrossRefGoogle Scholar
  37. Kidd G Jr, Mason C, Gallun F (2005) Combining energetic and informational masking for speech identification. J Acoust Soc Am 118:982–992CrossRefGoogle Scholar
  38. Leek M, Brown ME, Dorman MF (1991) Informational masking and auditory attention. Percept Psychophys 50:205–214CrossRefGoogle Scholar
  39. Li T, Fu QJ (2011) Voice gender discrimination provides a measure of more than pitch-related perception in cochlear implant users. Int J Audiol 50:498–502CrossRefGoogle Scholar
  40. Marrone N, Mason CR, Kidd G Jr (2008) Tuning in the spatial dimension: evidence from a masked speech identification task. J Acoust Soc Am 124:1146–1158CrossRefGoogle Scholar
  41. Moon IJ, Won J-H, Park M-H, Ives DT, Nie K, Heinz MG, Lorenzi C, Rubinstein JT (2014) Optimal combination of neural temporal envelope and fine structure cues to explain speech identification in background noise. J Neurosci 34:12145–12154CrossRefGoogle Scholar
  42. Moore BCJ (2012) An introduction to the psychology of hearing. 6. The Netherlands, BrillGoogle Scholar
  43. Oxenham AJ, Kreft HA (2014) Speech perception in tones and noise via cochlear implants reveals influence of spectral resolution on temporal processing. Trends Hear 18:1–14Google Scholar
  44. Ping L et al (2017) Implementation and preliminary evaluation of ‘C-tone’: a novel algorithm to improve lexical tone recognition in Mandarin-speaking cochlear implant users. Cochlear Implants Int 18(5):240–249CrossRefGoogle Scholar
  45. Poissant SF, Whitmal NA III, Freyman RL (2006) Effects of reverberation and masking on speech intelligibility in cochlear implant simulations. J Acoust Soc Am 119:1606–1615CrossRefGoogle Scholar
  46. Pollack I (1975) Auditory informational masking. J Acoust Soc Am 57:S5CrossRefGoogle Scholar
  47. Qin MK, Oxenham AJ (2003) Effects of simulated cochlearimplant processing on speech reception in fluctuating maskers. J Acoust Soc Am 114:446–454CrossRefGoogle Scholar
  48. Shannon R et al (1995) Speech recognition with primarily temporal cues. Science 270:303–304CrossRefGoogle Scholar
  49. Skuk VG, Schweinberger SR (2014) Influences of fundamental frequency, formant frequencies, aperiodicity and spectrum level on the perception of voice gender. J Speech Lang Hear Res 57(1):285–296CrossRefGoogle Scholar
  50. Smith DR, Patterson RD (2005) The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age. J Acoust Soc Am 118:3177–3186CrossRefGoogle Scholar
  51. Smith ZM et al (2014) Hearing better with interaural time differences and bilateral cochlear implants. J Acoust Soc Am 135(4):2190–2191CrossRefGoogle Scholar
  52. Stickney G et al (2004) Cochlear implant speech recognition with speech maskers. J Acoust Soc Am 116(2):1081–1091CrossRefGoogle Scholar
  53. Stone MA, Moore BCJ (2014) On the near non-existence of “pure” energetic masking release for speech. J Acoust Soc Am 135(4):1967–1977CrossRefGoogle Scholar
  54. Stone MA et al (2011) The importance for speech intelligibility of random fluctuations in “steady” background noise. J Acoust Soc Am 130(5):2874–2881CrossRefGoogle Scholar
  55. Stone MA, Fullgrabe C, Moore BCJ (2012) Notionally steady background noise acts primarily as a modulation masker of speech. J Acoust Soc Am 132(1):317–326CrossRefGoogle Scholar
  56. Swaminathan J et al (2016) Role of binaural temporal fine structure and envelope cues in cocktail-party listening. J Neurosci 36(31):8250–8257CrossRefGoogle Scholar
  57. Vandali AE et al (2005) Pitch ranking ability of cochlear implant recipients: a comparison of sound-processing strategies. J Acoust Soc Am 117(5):3126–3138CrossRefGoogle Scholar
  58. Vandali AE, Dawson PW, Arora K (2016) Results using the OPAL strategy in Mandarin speaking cochlear implant recipients. Int J Audiol Jun 22, pp. 1–12Google Scholar
  59. Watson CS (2005) Some comments on informational masking. Acta Acoust 91:502–512Google Scholar
  60. Yost B (2006) Informational masking: what is it?, in paper presented at the 2006 Computational and Systems Neuroscience (Cosyne) meetingGoogle Scholar
  61. Zirn S et al (2016) Perception of interaural phase differences with envelope and fine structure coding strategies in bilateral cochlear implant users. Trends Hear 20:2331216516665608Google Scholar
  62. Zurek PM (1993) Binaural advantages and directional effects in speech intelligibility. Acoustical factors affecting hearing aid performance, edited by G.A. Studebaker & I. Hochberg, pp. 255-275Google Scholar

Copyright information

© Association for Research in Otolaryngology 2018

Authors and Affiliations

  1. 1.Department of OtolaryngologyUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations