Binary mask based method for enhancement of mixed noise speech of low SNR input

Singh, Sachin; Tripathy, Manoj; Anand, R. S.

doi:10.1007/s10772-015-9305-5

Binary mask based method for enhancement of mixed noise speech of low SNR input

Published: 14 September 2015

Volume 18, pages 609–617, (2015)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Sachin Singh¹,
Manoj Tripathy¹ &
R. S. Anand¹

216 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a noise reduction method based on binary mask thresholding function for enhancement in single channel speech patterns of mixed highly non-stationary noises with low (negative) input SNR. For this purpose, a mixed highly non-stationary noisy speech database is generated by using noise and clean speech database of AURORA and INDIC speech, respectively. Results are compared with widely used methods such as Daubechies13, Daubechies40, Symlet13, Coiflet5, Wiener, Spectral Subtraction, and log-MMSE for performance evaluation in terms of SNR, PESQ, and Cepstrum distance parameters. In comparison to other methods the proposed single-channel speech enhancement method shows satisfactory results and obtained significant improvement in speech quality and intelligibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

Article 15 April 2024

A Strategic Approach for Robust Dysarthric Speech Recognition

Article 01 February 2024

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Article Open access 25 October 2023

References

Aggarwalet, R., et al. (2011). Noise reductions of speech signal using wavelet transform with modified universal threshold. International Journal of Computer Application, 20(5), 14–19.
Article Google Scholar
Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters, 8, 10–12.
Article Google Scholar
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Article Google Scholar
Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41, 613–627.
Article MATH MathSciNet Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(2), 443–445.
Article Google Scholar
Feng, D., et al. (2015). Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Australia.
Ghanbari, Y., & Reza, M. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48, 927–940.
Article Google Scholar
Hamid, Md. E., et al. (2013). Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD. In ISRN signal processing (Vol. 8).
Hansen, J., & Clements, M. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing, 39(4), 795–805.
Article Google Scholar
Hazrati, O., & Loizou, P. C. (2012a). Tackling the combined effects of reverberation and masking noise using ideal channel selection. Journal of Speech, Language, and Hearing Research, 55, 500–510.
Article Google Scholar
Hazrati, O., & Loizou, P. (2012b). Tackling the combined effects of reverberation and masking noise using ideal channel selection. Journal of Speech, Language, and Hearing Research, 55, 500–510.
Article Google Scholar
Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000, Paris, France, September 18–20, 2000. http://www.utdallas.edu/~loizou/speech/noizeus/.
Hu, Y., & Loizou, P. C. (2007). A comparative intelligibility study of single-microphone noise reduction algorithms. Journal Acoustic Society of America, 22, 1777–1786.
Article Google Scholar
Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 49(2), 123–133.
Article Google Scholar
Kim, G., & Loizou, P. C. (2010). A binary mask based on noise constraints for improved speech intelligibility. In Interspeech ISCA, Japan.
Lim, J., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586–1604.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement theory and practice. USA: CRC Press.
Google Scholar
Paliwal, K. K., Schwerin, B., & Wojcicki, K. K. (2011). Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Communication, 53(3), 327–339.
Article Google Scholar
Paliwal, K. K., Schwerin, B., & Wojcicki, K. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.
Article Google Scholar
Prahallad, K., Kumar, E. N., Keri, V., Rajendran, S., & Black, A. W. Interspeech-2012. (http://speech.iiit.ac.in/index.php/research-svl/69.html).
Sanam, T. F., & Shahnaz, C. (2012a). Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function. Signal Processing: An International Journal, 6(2), 22.
Google Scholar
Sanam, T. F., & Shahnaz, C. (2012b). Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold. International Journal of Speech Technology, 15(4), 463–475.
Article Google Scholar
Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of IEEE international conference on acoustics, speech, signal processing (pp. 629–632).
Shao, Y., & Chang, C. (2007). A generalized time–frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, 37(4), 877–889.
Article Google Scholar
Sheikhzadeh, H. & Abutalebi, H. R. (2001). An improved wavelet-based speech enhancement system. In EUROSPEECH (pp. 1855–1858).
Singh, S., Tripathy, M., & Anand, R. S. (2014). Single channel speech enhancement for mixed non-stationary noise environments. Advances in Signal Processing and Intelligent Recognition Systems, 64, 545–555.
Article Google Scholar
Singh, S., Tripathy, M., & Anand, R. S. (2015). A wavelet based method for removal of highly non-stationary noises from single-channel hindi speech patterns of low input SNR. International Journal of Speech Technology, 18(2), 157–166.
Article Google Scholar
Sumithra, A. (2009). Performance evaluation of different thresholding methods in time adaptivewavelet based speech enhancement. IACSIT, 1(5), 42–51.
Google Scholar
Tabibian, S., Akbari, A., & Nasersharif, B. (2009). A new wavelet thresholding method for speech enhancement based on symmetric Kullback–Leibler divergence. In 14th international computer conference (CSICC) (pp. 495–500).
Wang, J., & Zhang, C. (2005). Noise reduction in speech based on bark scaled wavelet packet decomposition and teager energy operator. Signal Processing, China, 21, 44–47.
MATH Google Scholar
Weiss, M., Aschkenasy, E., & Parsons, T. W. (1974). Study and the development of the INTEL technique for improving speech intelligibility. Technical Report NSC-FR/4023, Nicolet Scientific Corporation.
Wiener, N. (1949). Extrapolation, interpolation and smoothing of stationary time series with engineering applications. Cambridge, MA: MIT Press.
MATH Google Scholar
Wojcicki, K., & Loizou, P. C. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913.
Article Google Scholar
Yi, H., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper Spectrum. IEEE Signal Processing Letters, 12, 59–67.
Google Scholar
Yu, G., Bacry, E., & Mallat, S. (2007). Audio signal denoising with complex wavelets and adaptive block attenuation. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 3, pp. 869–872).
Zhao, H., et al. (2011). An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition. Journal of Multimedia, 6(3), 308–315.
Article Google Scholar
Zhou, B. (2010). An improved wavelet-based speech enhancement method using adaptive block thresholding. In IEEE international conference on acoustic, speech, signal processing (ICASSP).

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, 247 667, India
Sachin Singh, Manoj Tripathy & R. S. Anand

Authors

Sachin Singh
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Tripathy
View author publications
You can also search for this author in PubMed Google Scholar
R. S. Anand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sachin Singh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh, S., Tripathy, M. & Anand, R.S. Binary mask based method for enhancement of mixed noise speech of low SNR input. Int J Speech Technol 18, 609–617 (2015). https://doi.org/10.1007/s10772-015-9305-5

Download citation

Received: 23 November 2014
Accepted: 30 August 2015
Published: 14 September 2015
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10772-015-9305-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Binary mask based method for enhancement of mixed noise speech of low SNR input

Abstract

Access this article

Similar content being viewed by others

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

A Strategic Approach for Robust Dysarthric Speech Recognition

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Binary mask based method for enhancement of mixed noise speech of low SNR input

Abstract

Access this article

Similar content being viewed by others

Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method

A Strategic Approach for Robust Dysarthric Speech Recognition

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation