Abstract
Monaural speech separation is the process of separating the target speech from the noisy speech mixture recorded using single microphone. It is a challenging problem in speech signal processing, and recently, computational auditory scene analysis (CASA) finds a reasonable solution to solve this problem. This research work proposes an image analysis-based algorithm to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality. The proposed algorithm consists of labeling the initial segmentation mask, boundary extraction, active pixel detection and finally eliminating the noisy non-active pixels. In labeling, the T–F mask obtained from the initial segmentation is labeled as periodicity pixel matrix and non-periodicity pixel matrix. Next boundaries are created by connecting all the possible nearby periodicity pixel matrix and non-periodicity pixel matrix as speech boundary. Some speech boundary may include noisy T–F units as holes, and these holes are treated using the proposed algorithm to properly classify them as the speech-dominant or noise-dominant T–F units in the active pixel detection process. Finally, the noisy T–F units are eliminated. The performance of the proposed algorithm is evaluated using TIMIT speech database. The experimental results show that the proposed algorithm improves the quality of the separated speech by increasing the signal-to-noise ratio by an average value of 9.64 dB and reduces the noise residue by 25.55% as compared to the noisy speech mixture.
Similar content being viewed by others
References
A.K.H. Al-Ali, D. David, S. Bouchra, C. Vinod, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access (2017). https://doi.org/10.1109/ACCESS.2017.2728801
A. Bednar, M.B. Francis, C.L. Edmund, Different spatio-temporal electroencephalography features drive the successful decoding of binaural and monaural cues for sound localization. Eur. J. Neurosci. 45(5), 679–689 (2017)
S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)
G.J. Brown, M.P. Cooke, Computational auditory scene analysis. Comput. Speech Lang. 8(4), 297–336 (1994)
G.J. Brown, D.L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, ed. by J. Benesty, S. Makino, J. Chen (Springer, New York, 2005), pp. 371–402
M. Dharmalingam, M.C. JohnWiselin, R. Rajavel, Optimizing the objective measure of speech quality in monaural speech separation, in Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 545–552 (2016)
Y. Ephraim, H.L. Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)
N. Harish, R. Rajavel, Monaural speech separation system based on optimum soft mask, in Proceedings of IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, pp. 1–5 (2014)
G. Hu, D. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)
G. Hu, D. Wang, An auditory scene analysis approach to monaural speech segregation, in Topics in Acoustic Echo and Noise Control, ed. by E. Hansler, G. Schmidt (Springer, New York, 2006), pp. 485–515
G. Hu, D. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Lang. Process. 15(2), 396–405 (2007)
K. Hu, D. Wang, Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 19(6), 1600–1609 (2011)
A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)
J. Jensen, J.H.L. Hansen, Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7), 731–740 (2001)
F.L. Lamel, R.H. Kassel, S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, in Proceedings of DARPA Speech Recognition Workshop, Report No SAIC -86/1546 (1986)
R. Meddis, Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Am. 83(3), 1056–1063 (1988)
G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99(10), 1333–135 (2012)
G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23(4), 581–599 (2012)
R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, et al., An efficient auditory filterbank based on the gammatone function. MRC Applied Psychology Unit (1988)
R. Pichevar, J. Rouat, A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures, in Nonlinear Speech Modeling and Applications (Lecture Notes in Computer Science), ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer, Berlin, 2005), pp. 430–435
H. Sameti, H. Sheikhzadeh, L. Deng et al., HMM-based strategies for enhancement of speech signals embedded in non-stationary noise. IEEE Trans. Speech Audio Process. 6(5), 445–455 (1998)
S. Shoba, R. Rajavel, Adaptive energy threshold selection for monaural speech separation, in Proceedings of IEEE International Conference on Communication and Signal Process, Melmaruvathur, India (2017)
I. Trowitzsch, Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)
D.L. Wang, H. Kun, Towards generalizing classification based speech separation. IEEE Trans. Audio Speech Lang. Process 21(1), 68–177 (2013)
D. Wang, Tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans. Audio Speech Lang. Process 18(8), 2067–2079 (2012)
D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684–697 (1999)
Y. Wang, J. Lin, Improved monaural speech segregation based on computational auditory scene analysis. J. Audio Speech Music Process. (2013). https://doi.org/10.1186/1687-4722-2013-2
M. Weintraub, A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. dissertation (Stanford University, Standford, CA, 1985)
X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shoba, S., Rajavel, R. Image Processing Techniques for Segments Grouping in Monaural Speech Separation. Circuits Syst Signal Process 37, 3651–3670 (2018). https://doi.org/10.1007/s00034-017-0728-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-017-0728-x