Advertisement

International Journal of Speech Technology

, Volume 21, Issue 4, pp 773–781 | Cite as

Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing

  • Yosra Mzah
  • Sandra Chaoui
  • Mériem Jaidane
Article

Abstract

In this paper, we deal with a pre-processing based on speech envelope modulation for intelligibility enhancement in reverberant large dimension public enclosed spaces. In fact, the blurring effect due to reverberation alters the speech perception in such conditions. This phenomenon results from the masking of consonants by the reverberated tails of the previous vowels. This is particularly accentuated for elderly persons suffering from presbycusis. The proposed pre-processing is inspired from the steady-state suppression technique which consists in the detection of the steady-state portions of speech and the multiplication of their waveforms with an attenuation coefficient in order to decrease their masking effect. While the steady-state suppression technique is performed in the frequency domain, the pre-processing described in this paper is rather performed in the temporal domain. Its key novelty consists in the detection of the speech voiced segments using a priori knowledge about the distributions of the powers and the durations of voiced and unvoiced phonemes. The performances of this pre-processing are evaluated with an objective criterion and with subjective listening tests involving normal hearing persons and using a set of nonsense Vowel–Consonant–Vowel syllables and railway station vocal announcements.

Keywords

Intelligibility Reverberation Pre-processing Envelope Overlap masking 

References

  1. Arai, T., Hodoshima, N., & Yasu, K. (2010). Using steady-state suppression to improve speech intelligibility in reverberant environments for elderly listeners. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1775–1780.CrossRefGoogle Scholar
  2. Arai, T., Kinoshita, K., Hodoshima, N., Kusumoto, A., & Kitamura, T. (2002). Effects of suppressing steady-state portions of speech on intelligibility in reverberant environments. Acoustical Science and Technology, 23(4), 229–232.CrossRefGoogle Scholar
  3. Arai, T., Murakami, Y., Hayashi, N., Hodoshima, N., & Kurisu, K. (2007). Inverse correlation of intelligibility of speech in reverberation with the amount of overlap-masking. Acoustical Science and Technology, 28(6), 438–441.CrossRefGoogle Scholar
  4. Assmann, P., & Summerfield, Q. (2004). The perception of speech under adverse conditions. In S. Greenberg, W. A. Ainsworth, A. N. Popper, & R. R. Fay (Eds.), Speech processing in the auditory system (pp. 231–308). Berlin: Springer.CrossRefGoogle Scholar
  5. Bolt, R., & MacDonald, A. (1949). Theory of speech masking by reverberation. The Journal of the Acoustical Society of America, 21(6), 577–580.CrossRefGoogle Scholar
  6. Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics.  https://doi.org/10.1007/s13042-017-0645-0.
  7. Duquesnoy, A., & Plomp, R. (1980). Effect of reverberation and noise on the intelligibility of sentences in cases of presbyacusis. The Journal of the Acoustical Society of America, 68(2), 537–544.CrossRefGoogle Scholar
  8. Flanagan, J., Berkley, D., Elko, G., West, J., & Sondhi, M. (1991). Autodirective microphone systems. Acta Acustica United with Acustica, 73(2), 58–71.Google Scholar
  9. Furui, S. (1986). On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America, 80(4), 1016–1025.CrossRefGoogle Scholar
  10. Habets, E. A. (2005). Multi-channel speech dereverberation based on a statistical model of late reverberation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, 2005 (ICASSP’05) (Vol. 4, pp. iv–173). IEEE.Google Scholar
  11. Habets, E. A. (2010). Speech dereverberation using statistical reverberation models. In P. A. Naylor & D. G. Gaubitch (Eds.), Speech dereverberation (pp. 57–93). New York: Springer.CrossRefGoogle Scholar
  12. Halling, D. C., & Humes, L. E. (2000). Factors affecting the recognition of reverberant speech by elderly listeners. Journal of Speech, Language, and Hearing Research, 43(2), 414–431.CrossRefGoogle Scholar
  13. Helfer, K. S., & Huntley, R. A. (1991). Aging and consonant errors in reverberation and noise. The Journal of the Acoustical Society of America, 90(4), 1786–1796.CrossRefGoogle Scholar
  14. Hodoshima, N., Miyauchi, Y., Yasu, K., & Arai, T. (2007). Steady-state suppression for improving syllable identification in reverberant environments: A case study in an elderly person. Acoustical Science and Technology, 28(1), 53–55.CrossRefGoogle Scholar
  15. Humes, L. E., & Dubno, J. R. (2010). Factors affecting speech understanding in older adults. In S. Gordon-Salant, R. D. Frisina, A. N. Popper, & R. R. Fay (Eds.), The aging auditory system (pp. 211–257). New York: Springer.CrossRefGoogle Scholar
  16. Kodrasi, I., & Doclo, S. (2016). Joint dereverberation and noise reduction based on acoustic multi-channel equalization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(4), 680–693.CrossRefGoogle Scholar
  17. Langhans, T., & Strube, H. (1982). Speech enhancement by nonlinear multiband envelope filtering. In IEEE international conference on acoustics, speech, and signal processing, ICASSP’82 (Vol. 7, pp. 156–159). IEEE.Google Scholar
  18. Mechergui, N., Djaziri-Larbi, S., & Jaïdane, M. (2017). Speech based transmission index for all: An intelligibility metric for variable hearing ability. The Journal of the Acoustical Society of America, 141(3), 1470–1480.CrossRefGoogle Scholar
  19. Miyoshi, M., & Kaneda, Y. (1988). Inverse filtering of room acoustics. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(2), 145–152.CrossRefGoogle Scholar
  20. Mzah, Y., Ahfir, M., & Jaidane, M. (2016). Late pre-dereverberation for speech intelligibility enhancement in public address systems. In International symposium on signal, image, video and communications (ISIVC) (pp. 291–296). IEEE.Google Scholar
  21. Nabelek, A. K., & Robinette, L. (1978). Influence of the precedence effect on word identification by normally hearing and hearing-impaired subjects. The Journal of the Acoustical Society of America, 63(1), 187–194.CrossRefGoogle Scholar
  22. Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. International conference on recent trends in image processing and pattern recognition (pp. 185–193). New York: Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CEA-LinkLab LaboratoryTelnetTunisia
  2. 2.Ecole Nationale d’Ingénieurs de TunisUniversité de Tunis El ManarTunisTunisia

Personalised recommendations