Skip to main content
Log in

Binary mask based method for enhancement of mixed noise speech of low SNR input

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents a noise reduction method based on binary mask thresholding function for enhancement in single channel speech patterns of mixed highly non-stationary noises with low (negative) input SNR. For this purpose, a mixed highly non-stationary noisy speech database is generated by using noise and clean speech database of AURORA and INDIC speech, respectively. Results are compared with widely used methods such as Daubechies13, Daubechies40, Symlet13, Coiflet5, Wiener, Spectral Subtraction, and log-MMSE for performance evaluation in terms of SNR, PESQ, and Cepstrum distance parameters. In comparison to other methods the proposed single-channel speech enhancement method shows satisfactory results and obtained significant improvement in speech quality and intelligibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aggarwalet, R., et al. (2011). Noise reductions of speech signal using wavelet transform with modified universal threshold. International Journal of Computer Application, 20(5), 14–19.

    Article  Google Scholar 

  • Bahoura, M., & Rouat, J. (2001). Wavelet speech enhancement based on the teager energy operator. IEEE Signal Processing Letters, 8, 10–12.

    Article  Google Scholar 

  • Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.

    Article  Google Scholar 

  • Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41, 613–627.

    Article  MATH  MathSciNet  Google Scholar 

  • Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  • Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 23(2), 443–445.

    Article  Google Scholar 

  • Feng, D., et al. (2015). Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments. In IEEE international conference on acoustics, speech and signal processing (ICASSP), Australia.

  • Ghanbari, Y., & Reza, M. (2006). A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets. Speech Communication, 48, 927–940.

    Article  Google Scholar 

  • Hamid, Md. E., et al. (2013). Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD. In ISRN signal processing (Vol. 8).

  • Hansen, J., & Clements, M. (1991). Constrained iterative speech enhancement with application to speech recognition. IEEE Transactions on Signal Processing, 39(4), 795–805.

    Article  Google Scholar 

  • Hazrati, O., & Loizou, P. C. (2012a). Tackling the combined effects of reverberation and masking noise using ideal channel selection. Journal of Speech, Language, and Hearing Research, 55, 500–510.

    Article  Google Scholar 

  • Hazrati, O., & Loizou, P. (2012b). Tackling the combined effects of reverberation and masking noise using ideal channel selection. Journal of Speech, Language, and Hearing Research, 55, 500–510.

    Article  Google Scholar 

  • Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000, Paris, France, September 18–20, 2000. http://www.utdallas.edu/~loizou/speech/noizeus/.

  • Hu, Y., & Loizou, P. C. (2007). A comparative intelligibility study of single-microphone noise reduction algorithms. Journal Acoustic Society of America, 22, 1777–1786.

    Article  Google Scholar 

  • Johnson, M. T., Yuan, X., & Ren, Y. (2007). Speech signal enhancement through adaptive wavelet thresholding. Speech Communication, 49(2), 123–133.

    Article  Google Scholar 

  • Kim, G., & Loizou, P. C. (2010). A binary mask based on noise constraints for improved speech intelligibility. In Interspeech ISCA, Japan.

  • Lim, J., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586–1604.

    Article  Google Scholar 

  • Loizou, P. C. (2007). Speech enhancement theory and practice. USA: CRC Press.

    Google Scholar 

  • Paliwal, K. K., Schwerin, B., & Wojcicki, K. K. (2011). Role of modulation magnitude and phase spectrum towards speech intelligibility. Speech Communication, 53(3), 327–339.

    Article  Google Scholar 

  • Paliwal, K. K., Schwerin, B., & Wojcicki, K. (2012). Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator. Speech Communication, 54(2), 282–305.

    Article  Google Scholar 

  • Prahallad, K., Kumar, E. N., Keri, V., Rajendran, S., & Black, A. W. Interspeech-2012. (http://speech.iiit.ac.in/index.php/research-svl/69.html).

  • Sanam, T. F., & Shahnaz, C. (2012a). Teager energy operation on wavelet packet coefficients for enhancing noisy speech using a hard thresholding function. Signal Processing: An International Journal, 6(2), 22.

    Google Scholar 

  • Sanam, T. F., & Shahnaz, C. (2012b). Enhancement of noisy speech based on a custom thresholding function with a statistically determined threshold. International Journal of Speech Technology, 15(4), 463–475.

    Article  Google Scholar 

  • Scalart, P., & Filho, J. (1996). Speech enhancement based on a priori signal to noise estimation. In Proceedings of IEEE international conference on acoustics, speech, signal processing (pp. 629–632).

  • Shao, Y., & Chang, C. (2007). A generalized time–frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system. IEEE Transactions on Systems, Man, and Cybernetics, 37(4), 877–889.

    Article  Google Scholar 

  • Sheikhzadeh, H. & Abutalebi, H. R. (2001). An improved wavelet-based speech enhancement system. In EUROSPEECH (pp. 1855–1858).

  • Singh, S., Tripathy, M., & Anand, R. S. (2014). Single channel speech enhancement for mixed non-stationary noise environments. Advances in Signal Processing and Intelligent Recognition Systems, 64, 545–555.

    Article  Google Scholar 

  • Singh, S., Tripathy, M., & Anand, R. S. (2015). A wavelet based method for removal of highly non-stationary noises from single-channel hindi speech patterns of low input SNR. International Journal of Speech Technology, 18(2), 157–166.

    Article  Google Scholar 

  • Sumithra, A. (2009). Performance evaluation of different thresholding methods in time adaptivewavelet based speech enhancement. IACSIT, 1(5), 42–51.

    Google Scholar 

  • Tabibian, S., Akbari, A., & Nasersharif, B. (2009). A new wavelet thresholding method for speech enhancement based on symmetric Kullback–Leibler divergence. In 14th international computer conference (CSICC) (pp. 495–500).

  • Wang, J., & Zhang, C. (2005). Noise reduction in speech based on bark scaled wavelet packet decomposition and teager energy operator. Signal Processing, China, 21, 44–47.

    MATH  Google Scholar 

  • Weiss, M., Aschkenasy, E., & Parsons, T. W. (1974). Study and the development of the INTEL technique for improving speech intelligibility. Technical Report NSC-FR/4023, Nicolet Scientific Corporation.

  • Wiener, N. (1949). Extrapolation, interpolation and smoothing of stationary time series with engineering applications. Cambridge, MA: MIT Press.

    MATH  Google Scholar 

  • Wojcicki, K., & Loizou, P. C. (2012). Channel selection in the modulation domain for improved speech intelligibility in noise. Journal of the Acoustical Society of America, 131(4), 2904–2913.

    Article  Google Scholar 

  • Yi, H., & Loizou, P. C. (2004). Speech enhancement based on wavelet thresholding the multitaper Spectrum. IEEE Signal Processing Letters, 12, 59–67.

    Google Scholar 

  • Yu, G., Bacry, E., & Mallat, S. (2007). Audio signal denoising with complex wavelets and adaptive block attenuation. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP) (Vol. 3, pp. 869–872).

  • Zhao, H., et al. (2011). An improved speech enhancement method based on teager energy operator and perceptual wavelet packet decomposition. Journal of Multimedia, 6(3), 308–315.

    Article  Google Scholar 

  • Zhou, B. (2010). An improved wavelet-based speech enhancement method using adaptive block thresholding. In IEEE international conference on acoustic, speech, signal processing (ICASSP).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sachin Singh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, S., Tripathy, M. & Anand, R.S. Binary mask based method for enhancement of mixed noise speech of low SNR input. Int J Speech Technol 18, 609–617 (2015). https://doi.org/10.1007/s10772-015-9305-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-015-9305-5

Keywords

Navigation