Enhancing speech intelligibility in reverberant spaces by a speech features distributions dependent pre-processing
In this paper, we deal with a pre-processing based on speech envelope modulation for intelligibility enhancement in reverberant large dimension public enclosed spaces. In fact, the blurring effect due to reverberation alters the speech perception in such conditions. This phenomenon results from the masking of consonants by the reverberated tails of the previous vowels. This is particularly accentuated for elderly persons suffering from presbycusis. The proposed pre-processing is inspired from the steady-state suppression technique which consists in the detection of the steady-state portions of speech and the multiplication of their waveforms with an attenuation coefficient in order to decrease their masking effect. While the steady-state suppression technique is performed in the frequency domain, the pre-processing described in this paper is rather performed in the temporal domain. Its key novelty consists in the detection of the speech voiced segments using a priori knowledge about the distributions of the powers and the durations of voiced and unvoiced phonemes. The performances of this pre-processing are evaluated with an objective criterion and with subjective listening tests involving normal hearing persons and using a set of nonsense Vowel–Consonant–Vowel syllables and railway station vocal announcements.
KeywordsIntelligibility Reverberation Pre-processing Envelope Overlap masking
- Bouguelia, M. R., Nowaczyk, S., Santosh, K., & Verikas, A. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics. https://doi.org/10.1007/s13042-017-0645-0.
- Flanagan, J., Berkley, D., Elko, G., West, J., & Sondhi, M. (1991). Autodirective microphone systems. Acta Acustica United with Acustica, 73(2), 58–71.Google Scholar
- Habets, E. A. (2005). Multi-channel speech dereverberation based on a statistical model of late reverberation. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, 2005 (ICASSP’05) (Vol. 4, pp. iv–173). IEEE.Google Scholar
- Langhans, T., & Strube, H. (1982). Speech enhancement by nonlinear multiband envelope filtering. In IEEE international conference on acoustics, speech, and signal processing, ICASSP’82 (Vol. 7, pp. 156–159). IEEE.Google Scholar
- Mzah, Y., Ahfir, M., & Jaidane, M. (2016). Late pre-dereverberation for speech intelligibility enhancement in public address systems. In International symposium on signal, image, video and communications (ISIVC) (pp. 291–296). IEEE.Google Scholar
- Vajda, S., & Santosh, K. (2016). A fast k-nearest neighbor classifier using unsupervised clustering. International conference on recent trends in image processing and pattern recognition (pp. 185–193). New York: Springer.Google Scholar