Improving Speech Intelligibility in Monaural Segregation System by Fusing Voiced and Unvoiced Speech Segments

  • S. ShobaEmail author
  • R. Rajavel


Improving the speech intelligibility remains a challenging problem in digital hearing aids. This research work proposes a new speech segregation algorithm to improve the speech intelligibility by effectively fusing the voiced and unvoiced segment of the speech signal using the genetic algorithm. The voiced speech segments are obtained using perceptual speech cues such as auto-correlation, cross-channel correlation and pitch. Similarly, the unvoiced speech segments are obtained using another perceptual speech cue onset/offset after subtracting the voiced segments. The speech onset- and offset-based segregation process actually produce segments for both voiced and unvoiced. The unvoiced speech segments are obtained by subtracting the voiced speech segments from the segments obtained using speech onset and offset. The unvoiced speech segments obtained using onset and offset may contain interference. This research work proposes a scheme to remove those interferences from the unvoiced speech segments and effectively fuse the segments of voiced and unvoiced speech using the genetic algorithm. The performance of the proposed algorithm is evaluated using the intelligibility measures such as CSII, NCM and STOI. The experimental results show that the proposed algorithm significantly improves the speech intelligibility with an average of 0.23 for CSII, 0.20 for NCM and 0.16 for STOI as compared with other existing systems.


Speech segregation CASA Segments fusion Segmentation Speech intelligibility 



  1. 1.
    G.J. Brown, M.P. Cooke, Computational auditory scene analysis. Comput. Speech Lang. 8(4), 297–336 (1994)CrossRefGoogle Scholar
  2. 2.
    G.J. Brown, D.L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, ed. by J. Benesty, S. Makino, J. Chen (Springer, New York, 2005), pp. 371–402CrossRefGoogle Scholar
  3. 3.
    M.P. Cooke, Modeling auditory processing and organization. Dissertation, University of Sheffield, Sheffield, 1993Google Scholar
  4. 4.
    M. Dharmalingam, M.C. John Wiselin, CASA for improving speech intelligibility in monaural speech separation. Int. J. Perform. Eng. 13(3), 259–263 (2017)Google Scholar
  5. 5.
    S. Donald, D. Wang, Time–frequency masking in the complex domain for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 25(7), 1492–1501 (2017)CrossRefGoogle Scholar
  6. 6.
    K. Gibak, P.C. Loizou, Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Trans. Audio Speech Lang. Proces. 18(8), 2080–2090 (2010)CrossRefGoogle Scholar
  7. 7.
    Y. Hu, P.C. Loizou, A comparative intelligibility study of speech enhancement algorithms, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Honolulu, Hawaii, pp. 561–564 (2007)Google Scholar
  8. 8.
    G. Hu, D. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)CrossRefGoogle Scholar
  9. 9.
    G. Hu, D. Wang, An auditory scene analysis approach to monaural speech segregation, in Topics in Acoustic Echo and Noise Control, ed. by E. Hansler, G. Schmidt (Springer, New York, 2006), pp. 485–515Google Scholar
  10. 10.
    G. Hu, D. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Lang. Process. 15(2), 396–405 (2007)CrossRefGoogle Scholar
  11. 11.
    K. Hu, D. Wang, Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 19(6), 1600–1609 (2011)CrossRefGoogle Scholar
  12. 12.
    J. Ma, Y. Hu, P. Loizou, Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. J. Acoust. Soc. Am. 125(5), 3387–3405 (2009)CrossRefGoogle Scholar
  13. 13.
    R. Meddis, Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Am. 83(3), 1056–1063 (1988)CrossRefGoogle Scholar
  14. 14.
    M. Nilesh, S. Ann et al., The potential for speech intelligibility improvement using the ideal binary mask and the ideal Wiener filter in single channel noise reduction systems: application to auditory prostheses. IEEE Trans. Audio Speech Lang. Process. 21(1), 63–72 (2013)CrossRefGoogle Scholar
  15. 15.
    R.D. Patterson, I. Nimmo-Smith, J. Holdsworth et al., An efficient auditory filter bank based on the gammatone function. MRC Appl. Psychol. Unit 2, 1 (1988)Google Scholar
  16. 16.
    R. Rajavel, P.S. Sathidevi, A new GA optimised reliability ratio based integration weight estimation scheme for decision fusion audio–visual speech recognition. Int. J. Signal Imaging Syst. Eng. 4(2), 123–131 (2011)CrossRefGoogle Scholar
  17. 17.
    R. Rajavel, P.S. Sathidevi, Adaptive reliability measure and optimum integration weight for decision fusion audio–visual speech recognition. J. Signal Process. Syst. 68(1), 83–93 (2012)CrossRefGoogle Scholar
  18. 18.
    S. Shoba, R. Rajavel, Adaptive energy threshold selection for monaural speech separation, in Proceedings of IEEE International Conference on Communication and Signal Processing, Melmaruvathur, India, pp. 905–908 (2017)Google Scholar
  19. 19.
    S. Shoba, R. Rajavel, Performance improvement of monaural speech separation system using image analysis techniques. IET Signal Process. 12(7), 896–906 (2018)CrossRefGoogle Scholar
  20. 20.
    S. Suman, C. Indrajit, K.G. Soumya, Improved single channel phase-aware speech enhancement technique for low signal-to-noise ratio signal. IET Signal Process. 10(6), 641–650 (2016)CrossRefGoogle Scholar
  21. 21.
    C.H. Taal, R.C. Hendriks, R. Heusdens et al., An algorithm for intelligibility prediction of time frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 19(7), 2125–2136 (2011)CrossRefGoogle Scholar
  22. 22.
    Y. Wang, J. Lin, Improved monaural speech segregation based on computational auditory scene analysis. J. Audio Speech Music Process. 1, 2 (2013). CrossRefGoogle Scholar
  23. 23.
    M. Weintraub, A theory and computational model of auditory monaural sound separation. Ph.D. Dissertation, Department of Electrical Engineering, Stanford University, 1985Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.SSN College of EngineeringChennaiIndia

Personalised recommendations