Skip to main content
Log in

Image Processing Techniques for Segments Grouping in Monaural Speech Separation

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Monaural speech separation is the process of separating the target speech from the noisy speech mixture recorded using single microphone. It is a challenging problem in speech signal processing, and recently, computational auditory scene analysis (CASA) finds a reasonable solution to solve this problem. This research work proposes an image analysis-based algorithm to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality. The proposed algorithm consists of labeling the initial segmentation mask, boundary extraction, active pixel detection and finally eliminating the noisy non-active pixels. In labeling, the T–F mask obtained from the initial segmentation is labeled as periodicity pixel matrix and non-periodicity pixel matrix. Next boundaries are created by connecting all the possible nearby periodicity pixel matrix and non-periodicity pixel matrix as speech boundary. Some speech boundary may include noisy T–F units as holes, and these holes are treated using the proposed algorithm to properly classify them as the speech-dominant or noise-dominant T–F units in the active pixel detection process. Finally, the noisy T–F units are eliminated. The performance of the proposed algorithm is evaluated using TIMIT speech database. The experimental results show that the proposed algorithm improves the quality of the separated speech by increasing the signal-to-noise ratio by an average value of 9.64 dB and reduces the noise residue by 25.55% as compared to the noisy speech mixture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. A.K.H. Al-Ali, D. David, S. Bouchra, C. Vinod, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access (2017). https://doi.org/10.1109/ACCESS.2017.2728801

    Google Scholar 

  2. A. Bednar, M.B. Francis, C.L. Edmund, Different spatio-temporal electroencephalography features drive the successful decoding of binaural and monaural cues for sound localization. Eur. J. Neurosci. 45(5), 679–689 (2017)

    Article  Google Scholar 

  3. S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)

    Article  Google Scholar 

  4. G.J. Brown, M.P. Cooke, Computational auditory scene analysis. Comput. Speech Lang. 8(4), 297–336 (1994)

    Article  Google Scholar 

  5. G.J. Brown, D.L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, ed. by J. Benesty, S. Makino, J. Chen (Springer, New York, 2005), pp. 371–402

    Chapter  Google Scholar 

  6. M. Dharmalingam, M.C. JohnWiselin, R. Rajavel, Optimizing the objective measure of speech quality in monaural speech separation, in Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 545–552 (2016)

  7. Y. Ephraim, H.L. Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)

    Article  Google Scholar 

  8. N. Harish, R. Rajavel, Monaural speech separation system based on optimum soft mask, in Proceedings of IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, pp. 1–5 (2014)

  9. G. Hu, D. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)

    Article  Google Scholar 

  10. G. Hu, D. Wang, An auditory scene analysis approach to monaural speech segregation, in Topics in Acoustic Echo and Noise Control, ed. by E. Hansler, G. Schmidt (Springer, New York, 2006), pp. 485–515

    Google Scholar 

  11. G. Hu, D. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Lang. Process. 15(2), 396–405 (2007)

    Article  Google Scholar 

  12. K. Hu, D. Wang, Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 19(6), 1600–1609 (2011)

    Article  Google Scholar 

  13. A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)

    Book  Google Scholar 

  14. J. Jensen, J.H.L. Hansen, Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7), 731–740 (2001)

    Article  Google Scholar 

  15. F.L. Lamel, R.H. Kassel, S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, in Proceedings of DARPA Speech Recognition Workshop, Report No SAIC -86/1546 (1986)

  16. R. Meddis, Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Am. 83(3), 1056–1063 (1988)

    Article  Google Scholar 

  17. G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99(10), 1333–135 (2012)

    Article  Google Scholar 

  18. G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23(4), 581–599 (2012)

    MathSciNet  MATH  Google Scholar 

  19. R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, et al., An efficient auditory filterbank based on the gammatone function. MRC Applied Psychology Unit (1988)

  20. R. Pichevar, J. Rouat, A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures, in Nonlinear Speech Modeling and Applications (Lecture Notes in Computer Science), ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer, Berlin, 2005), pp. 430–435

    Google Scholar 

  21. H. Sameti, H. Sheikhzadeh, L. Deng et al., HMM-based strategies for enhancement of speech signals embedded in non-stationary noise. IEEE Trans. Speech Audio Process. 6(5), 445–455 (1998)

    Article  Google Scholar 

  22. S. Shoba, R. Rajavel, Adaptive energy threshold selection for monaural speech separation, in Proceedings of IEEE International Conference on Communication and Signal Process, Melmaruvathur, India (2017)

  23. I. Trowitzsch, Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)

    Article  Google Scholar 

  24. D.L. Wang, H. Kun, Towards generalizing classification based speech separation. IEEE Trans. Audio Speech Lang. Process 21(1), 68–177 (2013)

    Google Scholar 

  25. D. Wang, Tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans. Audio Speech Lang. Process 18(8), 2067–2079 (2012)

    Google Scholar 

  26. D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684–697 (1999)

    Article  Google Scholar 

  27. Y. Wang, J. Lin, Improved monaural speech segregation based on computational auditory scene analysis. J. Audio Speech Music Process. (2013). https://doi.org/10.1186/1687-4722-2013-2

    Google Scholar 

  28. M. Weintraub, A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. dissertation (Stanford University, Standford, CA, 1985)

  29. X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Shoba.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shoba, S., Rajavel, R. Image Processing Techniques for Segments Grouping in Monaural Speech Separation. Circuits Syst Signal Process 37, 3651–3670 (2018). https://doi.org/10.1007/s00034-017-0728-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-017-0728-x

Keywords

Navigation