Image Processing Techniques for Segments Grouping in Monaural Speech Separation

Shoba, S.; Rajavel, R.

doi:10.1007/s00034-017-0728-x

Image Processing Techniques for Segments Grouping in Monaural Speech Separation

Published: 18 December 2017

Volume 37, pages 3651–3670, (2018)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

S. Shoba¹ &
R. Rajavel¹

343 Accesses
19 Citations
Explore all metrics

Abstract

Monaural speech separation is the process of separating the target speech from the noisy speech mixture recorded using single microphone. It is a challenging problem in speech signal processing, and recently, computational auditory scene analysis (CASA) finds a reasonable solution to solve this problem. This research work proposes an image analysis-based algorithm to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality. The proposed algorithm consists of labeling the initial segmentation mask, boundary extraction, active pixel detection and finally eliminating the noisy non-active pixels. In labeling, the T–F mask obtained from the initial segmentation is labeled as periodicity pixel matrix and non-periodicity pixel matrix. Next boundaries are created by connecting all the possible nearby periodicity pixel matrix and non-periodicity pixel matrix as speech boundary. Some speech boundary may include noisy T–F units as holes, and these holes are treated using the proposed algorithm to properly classify them as the speech-dominant or noise-dominant T–F units in the active pixel detection process. Finally, the noisy T–F units are eliminated. The performance of the proposed algorithm is evaluated using TIMIT speech database. The experimental results show that the proposed algorithm improves the quality of the separated speech by increasing the signal-to-noise ratio by an average value of 9.64 dB and reduces the noise residue by 25.55% as compared to the noisy speech mixture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

Article Open access 04 August 2015

Separation of Speech from Speech Interference Based on EGG

References

A.K.H. Al-Ali, D. David, S. Bouchra, C. Vinod, G.R. Naik, Enhanced forensic speaker verification using a combination of DWT and MFCC feature warping in the presence of noise and reverberation conditions. IEEE Access (2017). https://doi.org/10.1109/ACCESS.2017.2728801
Google Scholar
A. Bednar, M.B. Francis, C.L. Edmund, Different spatio-temporal electroencephalography features drive the successful decoding of binaural and monaural cues for sound localization. Eur. J. Neurosci. 45(5), 679–689 (2017)
Article Google Scholar
S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)
Article Google Scholar
G.J. Brown, M.P. Cooke, Computational auditory scene analysis. Comput. Speech Lang. 8(4), 297–336 (1994)
Article Google Scholar
G.J. Brown, D.L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, ed. by J. Benesty, S. Makino, J. Chen (Springer, New York, 2005), pp. 371–402
Chapter Google Scholar
M. Dharmalingam, M.C. JohnWiselin, R. Rajavel, Optimizing the objective measure of speech quality in monaural speech separation, in Proceedings of 3rd International Conference on Advanced Computing, Networking and Informatics, pp. 545–552 (2016)
Y. Ephraim, H.L. Trees, A signal subspace approach for speech enhancement. IEEE Trans. Speech Audio Process. 3(4), 251–266 (1995)
Article Google Scholar
N. Harish, R. Rajavel, Monaural speech separation system based on optimum soft mask, in Proceedings of IEEE International Conference on Computational Intelligence and Computing Research, Coimbatore, India, pp. 1–5 (2014)
G. Hu, D. Wang, Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)
Article Google Scholar
G. Hu, D. Wang, An auditory scene analysis approach to monaural speech segregation, in Topics in Acoustic Echo and Noise Control, ed. by E. Hansler, G. Schmidt (Springer, New York, 2006), pp. 485–515
Google Scholar
G. Hu, D. Wang, Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Lang. Process. 15(2), 396–405 (2007)
Article Google Scholar
K. Hu, D. Wang, Unvoiced speech segregation from non-speech interference via CASA and spectral subtraction. IEEE Trans. Audio Speech Lang. Process. 19(6), 1600–1609 (2011)
Article Google Scholar
A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, New York, 2001)
Book Google Scholar
J. Jensen, J.H.L. Hansen, Speech enhancement using a constrained iterative sinusoidal model. IEEE Trans. Speech Audio Process. 9(7), 731–740 (2001)
Article Google Scholar
F.L. Lamel, R.H. Kassel, S. Seneff, Speech database development: design and analysis of the acoustic-phonetic corpus, in Proceedings of DARPA Speech Recognition Workshop, Report No SAIC -86/1546 (1986)
R. Meddis, Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Am. 83(3), 1056–1063 (1988)
Article Google Scholar
G.R. Naik, W. Wang, Audio analysis of statistically instantaneous signals with mixed Gaussian probability distributions. Int. J. Electron. 99(10), 1333–135 (2012)
Article Google Scholar
G.R. Naik, Measure of quality of source separation for sub-and super-Gaussian audio mixtures. Informatica 23(4), 581–599 (2012)
MathSciNet MATH Google Scholar
R.D. Patterson, I. Nimmo-Smith, J. Holdsworth, et al., An efficient auditory filterbank based on the gammatone function. MRC Applied Psychology Unit (1988)
R. Pichevar, J. Rouat, A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures, in Nonlinear Speech Modeling and Applications (Lecture Notes in Computer Science), ed. by G. Chollet, A. Esposito, M. Faundez-Zanuy, M. Marinaro (Springer, Berlin, 2005), pp. 430–435
Google Scholar
H. Sameti, H. Sheikhzadeh, L. Deng et al., HMM-based strategies for enhancement of speech signals embedded in non-stationary noise. IEEE Trans. Speech Audio Process. 6(5), 445–455 (1998)
Article Google Scholar
S. Shoba, R. Rajavel, Adaptive energy threshold selection for monaural speech separation, in Proceedings of IEEE International Conference on Communication and Signal Process, Melmaruvathur, India (2017)
I. Trowitzsch, Robust detection of environmental sounds in binaural auditory scenes. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1344–1356 (2017)
Article Google Scholar
D.L. Wang, H. Kun, Towards generalizing classification based speech separation. IEEE Trans. Audio Speech Lang. Process 21(1), 68–177 (2013)
Google Scholar
D. Wang, Tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans. Audio Speech Lang. Process 18(8), 2067–2079 (2012)
Google Scholar
D.L. Wang, G.J. Brown, Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Netw. 10, 684–697 (1999)
Article Google Scholar
Y. Wang, J. Lin, Improved monaural speech segregation based on computational auditory scene analysis. J. Audio Speech Music Process. (2013). https://doi.org/10.1186/1687-4722-2013-2
Google Scholar
M. Weintraub, A Theory and Computational Model of Auditory Monaural Sound Separation. Ph.D. dissertation (Stanford University, Standford, CA, 1985)
X. Zhang, D. Wang, Deep learning based binaural speech separation in reverberant environments. IEEE/ACM Trans. Audio Speech Lang. Process. 25(5), 1075–1084 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, SSN College of Engineering, Old Mahabalipuram Road, Chennai, Tamil Nadu, 603 110, India
S. Shoba & R. Rajavel

Authors

S. Shoba
View author publications
You can also search for this author in PubMed Google Scholar
R. Rajavel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Shoba.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shoba, S., Rajavel, R. Image Processing Techniques for Segments Grouping in Monaural Speech Separation. Circuits Syst Signal Process 37, 3651–3670 (2018). https://doi.org/10.1007/s00034-017-0728-x

Download citation

Received: 28 July 2017
Revised: 02 December 2017
Accepted: 04 December 2017
Published: 18 December 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s00034-017-0728-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Processing Techniques for Segments Grouping in Monaural Speech Separation

Abstract

Access this article

Similar content being viewed by others

Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

Separation of Speech from Speech Interference Based on EGG

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Image Processing Techniques for Segments Grouping in Monaural Speech Separation

Abstract

Access this article

Similar content being viewed by others

Optimizing the Objective Measure of Speech Quality in Monaural Speech Separation

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

Separation of Speech from Speech Interference Based on EGG

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation