Advertisement

Deep convolutional neural network-based speech enhancement to improve speech intelligibility and quality for hearing-impaired listeners

  • P. F. Khaleelur Rahiman
  • V. S. Jayanthi
  • A. N. Jayanthi
ORIGINAL ARTICLE
  • 38 Downloads

Abstract

In this paper, we propose a deep learning-based speech enhancement (DLSE) method to improve speech intelligibility for the hearing-impaired listeners. The algorithm decomposes the noisy speech signal into frames (as features) and feeds them to the deep convolutional neural networks (DCNNs) to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated cochlear implant (CI) channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 12 CI users using two types of background noises such as fan and music sounds. The architecture and low processing delay of the DLSE algorithm make it suitable for application in hearing devices. While DLSE was evaluated using a noise-specific approach, several aspects of generalisation to unseen acoustic conditions were addressed, most importantly performance with a speaker not used during the training stage. The largest improvements for both speech intelligibility and quality are found by DCNN-based proposed method. Moreover, the results show that DCNN-based methods appeared more promising than existing methods.

Graphical abstract

Keywords

Deep convolutional neural networks Cochlear implant Speech intelligibility Signal-to-noise ratio 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

References

  1. 1.
    Fetterman BL, Domico EH (2002) Speech recognition in background noise of cochlear implant patients. Otolaryngol Head Neck Surg 126:257–263CrossRefPubMedCentralGoogle Scholar
  2. 2.
    Spriet A, Van Deun L, Eftaxiadis K, Laneau J, Moonen M, van Dijk B, van Wieringen A, Wouters J (2007) Speech understanding in background noise with the two-microphone adaptive beamformer BEAM in the Nucleus Freedom Cochlear Implant System. Ear Hear 28:62–72CrossRefPubMedCentralGoogle Scholar
  3. 3.
    Wouters J, Van den Berghe J (2001) Speech recognition in noise for cochlear implantees with a two microphone monaural adaptive noise reduction system. Ear Hear 22:420–430CrossRefPubMedCentralGoogle Scholar
  4. 4.
    Cullington HE, Zeng F-G (2008) “Speech recognition with varying numbers and types of competing talkers by normal-hearing,” cochlear-implant, and implant simulation subjects. J Acoust Soc Am 123:450–461CrossRefPubMedCentralGoogle Scholar
  5. 5.
    Oxenham AJ, Kreft HA (2014) Speech perception in tones and noise via cochlear implants reveals influence of spectral resolution on temporal processing. Trends Hear 18:1–14Google Scholar
  6. 6.
    Miller GA, Licklider JCR (1950) The intelligibility of interrupted speech. J Acoust Soc Am 22:167–173CrossRefGoogle Scholar
  7. 7.
    Friesen LM, Shannon RV, Baskent D, Wang X (2001) Speech recognition in noise as a function of the number of spectral channels: comparison of acoustic hearing and cochlear implants. J Acoust Soc Am 110:1150–1163CrossRefPubMedCentralGoogle Scholar
  8. 8.
    Fu QJ, Shannon RV, Wang X (2013) Effects of noise and spectral resolution on vowel and consonant recognition: acoustic and electric hearing. J Acoust Soc Am 104:3586–3596CrossRefGoogle Scholar
  9. 9.
    Jin SH, Nie Y, Nelson P (2013) Masking release and modulation interference in cochlear implant and simulation listeners. Am J Audiol 22:135–146CrossRefPubMedCentralGoogle Scholar
  10. 10.
    Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126Google Scholar
  11. 11.
    Tsoukalas DE, Mourjopoulos JN, Kokkinakis G (1997) Speech enhancement based on audible noise suppression. IEEE Trans Speech Audio Process 5:497–514CrossRefGoogle Scholar
  12. 12.
    Sang J, Hu H, Zheng C, Li G, Lutman ME, Bleeck S (2015) Speech quality evaluation of a sparse coding shrinkage noise reduction algorithm with normal hearing and hearing impaired listeners. Hear Res 327:175–185CrossRefPubMedCentralGoogle Scholar
  13. 13.
    Bentler R, Wu Y-H, Kettel J, Hurtig R (2008) Digital noise reduction: outcomes from laboratory and field studies. Int J Audiol 47:447–460CrossRefPubMedCentralGoogle Scholar
  14. 14.
    Zakis JA, Hau J, Blamey PJ (2009) Environmental noise reduction configuration: effects on preferences, satisfaction, and speech understanding. Int J Audiol 48:853–867CrossRefPubMedCentralGoogle Scholar
  15. 15.
    Luts H, Eneman K, Wouters J, Schulte M, Vormann M, Buechler M, Dillier N, Houben R, Dreschler WA, Froehlich M, Puder H, Grimm G, Hohmann V, Leijon A, Lombard A, Mauler D, Spriet A (2010) Multicenter evaluation of signal enhancement algorithms for hearing aids. J Acoust Soc Am 127:1491–1505CrossRefPubMedCentralGoogle Scholar
  16. 16.
    Fredelake S, Holube I, Schlueter A, Hansen M (2012) Measurement and prediction of the acceptable noise level for single microphone noise reduction algorithms. Int J Audiol 51:299–308CrossRefPubMedCentralGoogle Scholar
  17. 17.
    Loizou P, Kim G (2011) Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Trans Audio Speech Lang Process 19(1):47–56CrossRefPubMedCentralGoogle Scholar
  18. 18.
    D. Shalini Punithavathani, K. Sujatha, J. Mark Jain, (2015) Surveillance of anomaly and misuse in critical networks to counter insider threats using computational intelligence. Cluster Computing 18 (1):435–451Google Scholar
  19. 19.
    Vinu Sundararaj, (2016) An Efficient Threshold Prediction Scheme for Wavelet Based ECG Signal Noise Reduction Using Variable Step Size Firefly Algorithm. International Journal of Intelligent Engineering and Systems 9 (3):117–126Google Scholar
  20. 20.
    K . Sujatha, D. Shalini Punithavathani, (2018) Optimized ensemble decision-based multi-focus imagefusion using binary genetic Grey-Wolf optimizer in camera sensor networks. Multimedia Tools and Applications 77 (2):1735–1759Google Scholar
  21. 21.
    Vinu Sundararaj, Selvi Muthukumar, & Kumar, R. S. (2018). An optimal cluster formation based energy efficient dynamic scheduling hybrid MAC protocol for heavy traffic load in wireless sensor networks. Computers & Security, 77, 277–288Google Scholar
  22. 22.
    Sundararaj, V. (2018). Optimal task assignment in mobile cloud computing by queue based Ant-Bee algorithm. Wireless Personal Communications.  https://doi.org/10.1007/s11277-018-6014-9
  23. 23.
    Healy EW, Yoho SE, Chen J, Wang Y, Wang D (2015) An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. J Acoust Soc Am 138:1660–1669CrossRefPubMedCentralGoogle Scholar
  24. 24.
    Healy EW, Yoho SE, Wang Y, Wang D (2013) An algorithm to improve speech recognition in noise for hearing-impaired listeners. J Acoust Soc Am 134:3029–3038CrossRefPubMedCentralGoogle Scholar
  25. 25.
    Bolner F, Goehring T, Monaghan J, van Dijk B, Wouters J, Bleeck S (2016) Speech enhancement based on neural networks applied to cochlear implant coding strategies. In: 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 6520–6524Google Scholar
  26. 26.
    Goehring T, Bolner F, Monaghan JJ, van Dijk B, Zarowski A, Bleeck S (2016) Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users. Hear Res 344:183–194CrossRefPubMedCentralGoogle Scholar
  27. 27.
    Hu Y, Loizou PC (2010) Environment-specific noise suppression for improved speech intelligibility by cochlear implant users. J Acoust Soc Am 127:3689–3695CrossRefPubMedCentralGoogle Scholar
  28. 28.
    Kim G, Lu Y, Hu Y, Loizou PC (2009) An algorithm that improves speech intelligibility in noise for normal-hearing listeners. J Acoust Soc Am 126:1486–1494CrossRefPubMedCentralGoogle Scholar
  29. 29.
    Dahl, George E., Dong Yu, Li Deng, and Alex Acero. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on audio, speech, and language processing, 20(1):30–42Google Scholar
  30. 30.
    Hinton G, Deng L, Yu D, Dahl GE, Mohamed A-R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
  31. 31.
    Spille C, Stephan D, Birger E, Bernd K, Meyer T (2018) Predicting speech intelligibility with deep neural networks. Computer Speech & Language, 48:51–66Google Scholar
  32. 32.
    Yang D, Mak CM (2018) An investigation of speech intelligibility for second language students in classrooms, Applied Acoustics, 134:54–59Google Scholar
  33. 33.
    Giovanni M, Di Liberto Edmund C, Lalor R, Millman E (2018) Causal cortical dynamics of a predictive enhancement of speech intelligibility, Neuroimage, 166:247–258Google Scholar
  34. 34.
    Kondo K, Taira K (2018) Estimation of binaural speech intelligibility using machine learning, Applied Acoustics, 129:408–416Google Scholar
  35. 35.
    Wang YX, Wang DL (2013) Towards scaling up classification based speech separation. Audio, Speech, and Language Processing, IEEE Transactions on 21(7):1381–1390CrossRefGoogle Scholar
  36. 36.
    Yuxuan Wang, Arun Narayanan, DeLiang Wang. "On training targets for supervised speech separation." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 22, no. 12 (2014): 1849-1858.Google Scholar
  37. 37.
    Po-Sen Huang,Minje Kim,Mark Hasegawa-Johnson, Paris Smaragdis, (2015) Joint optimization of masks and deep recurrent neural networks for monaural source separation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), pp.2136-2147Google Scholar
  38. 38.
    Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. "A regression approach to speech enhancement based on deep neural networks." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 23, no. 1 (2015): 7-19.Google Scholar
  39. 39.
    Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee, "Global variance equalization for improving deep neural network based speech enhancement." In Signal and Information Processing (ChinaSIP), 2014 IEEE China Summit & International Conference on, pp. 71-75. IEEE, 2014.Google Scholar
  40. 40.
    Xu Y, Du J, Dai L-R, Lee C-H (2014) Dynamic noise aware training for speech enhancement based on deep neural networks. In: INTERSPEECH, pp. 2670–2674Google Scholar
  41. 41.
    Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee, "Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement." arXiv preprint arXiv:1703.07172 (2017).Google Scholar
  42. 42.
    Minje Kim,Paris Smaragdis,"Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures"International Conference on Latent Variable Analysis and Signal Separation,pp 100-107,2015.Google Scholar
  43. 43.
    Gao T, Du J, Xu Y, Liu C, Dai L-R, Lee C-H (2015) Improving deep neural network based speech enhancement in low SNR environments. In: International Conference on Latent Variable Analysis and Signal Separation, Springer, pp. 75–82Google Scholar
  44. 44.
    Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey JR, Schuller B (2015) Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: International Conference on Latent Variable Analysis and Signal Separation, Springer, pp. 91–99Google Scholar
  45. 45.
    Szu-Wei Fu, Yu Tsao, Xugang Lu. "SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement." In Interspeech, pp. 3768-3772. 2016.Google Scholar
  46. 46.
    Tu Y-H, Du J, Xu Y, Dai L-R, Lee C-H (2014) Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers. In: ISCSLP, IEEE, pp. 250–254Google Scholar
  47. 47.
    Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefPubMedCentralGoogle Scholar
  48. 48.
    Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, Dongya Wu. "Convolutional neural networks for time series classification." Journal of Systems Engineering and Electronics 28, no. 1 (2017): 162-169.Google Scholar
  49. 49.
    Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, Tsuhan Chen,"Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.Google Scholar
  50. 50.
    Yann L, Yoshua B, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  51. 51.
    He K, X. Zhang, S. Ren, and J. Sun (2016) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 11–18–Dece, pp. 1026–1034Google Scholar
  52. 52.
    Bouvrie, Jake, (2006) Notes on convolutional neural networks. In Pract., pp. 47–60Google Scholar
  53. 53.
    Healy EW, Yoho SE, Wang Y, Apoux F, Wang D (2014) Speech-cue transmission by an algorithm to increase consonant recognition in noise for hearing-impaired listeners. J Acoust Soc Am 136:3325–3336CrossRefPubMedCentralGoogle Scholar
  54. 54.
    Tchorz J, Kollmeier B (2003) SNR estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Trans Speech Audio Process 11(3):184–192CrossRefGoogle Scholar
  55. 55.
    Hermansky H, Morgan N (1994) RASTA processing of speech. IEEE Trans Speech Audio Process 2(4):578–589CrossRefGoogle Scholar
  56. 56.
    Bleeck S, Ives T, Patterson RD (2004) Aim-mat: the auditory image model in MATLAB. Acta Acust Acust 90:781–787Google Scholar
  57. 57.
    Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual Evaluation of Speech Quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), Vol. 2, pp. 749–752Google Scholar

Copyright information

© International Federation for Medical and Biological Engineering 2018

Authors and Affiliations

  • P. F. Khaleelur Rahiman
    • 1
  • V. S. Jayanthi
    • 2
  • A. N. Jayanthi
    • 3
  1. 1.Electronics and Communication EngineeringHindusthan College of Engineering and TechnologyCoimbatoreIndia
  2. 2.Electronics and Communication EngineeringRajagiri School of Engineering and TechnologyCochinIndia
  3. 3.Electronics and Communication EngineeringSri Ramakrishna Institute of TechnologyCoimbatoreIndia

Personalised recommendations