Skip to main content

Advertisement

Log in

Blind Signal-to-Noise Ratio Estimation of Speech Based on Vector Quantizer Classifiers and Decision Level Fusion

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

A blind approach for estimating the signal to noise ratio (SNR) of a speech signal corrupted by additive noise is proposed. The method is based on a pattern recognition paradigm using various linear predictive based features, a vector quantizer classifier and estimation combination. Blind SNR estimation is very useful in speaker identification systems in which a confidence metric is determined along with the speaker identity. The confidence metric is partially based on the mismatch between the training and testing conditions of the speaker identification system and SNR estimation is very important in evaluating the degree of this mismatch. The aim is to correctly estimate SNR values from 0 to 30 dB, a range that is both practical and crucial for speaker identification systems. Experiments consider (1) artificially generated additive white Gaussian noise, pink noise and bandpass noise and (2) fifteen noise types from the NOISEX database. Four features are combined to get the best results. The average SNR estimation error depends on the type of noise in that a relatively low error results for pink noise and jet cockpit noise and a high error results for destroyer operations room noise and military vehicle noise. For both artificially generated noise and the NOISEX data, the error is lower than what is achieved by the IMCRA method that uses SNR estimation for speech enhancement. Combining the four features with IMCRA lowers the error for 8 of the 15 noise types from NOISEX.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

References

  1. Quatieri, T.F. (2002). Discrete time speech signal processing principles and practice prentice hall PTR.

  2. Campbell, J.P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85, 1437–1462.

    Article  Google Scholar 

  3. Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues, IEEE Circuits and Systems Magazine, 23–61.

  4. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  5. Huggins, M.C., & Grieco, J.J. (2002). Confidence Metrics for speaker identification, int. Conf. on spoken language processing. Colorado: Denver.

    Google Scholar 

  6. Huggins, M.C., & Grieco, J.J. (2004). Speaker identification confidence metrics for heterogeneous model spaces, Proc. of the 8th World Multiconference on Systemics, Cybernetics and Informatics, Orlando, Florida, 440–443.

  7. Cohen, I., & Berdugo, B. (2001). Speech enhancement for non-stationary noise environments. Signal Processing, 81, 2403–2418.

    Article  MATH  Google Scholar 

  8. Cohen, I., & Berdugo, B. (2002). Noise estimation for minima controlled recursive averaging for robust speech enhancement. IEEE Signal Processing Letters, 9, 12–15.

    Article  Google Scholar 

  9. Cohen, I. (2003). Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging. IEEE Transactions on Speech and Audio Proceedings, 11, 466–475.

    Article  Google Scholar 

  10. Rangachari, S., & Loizou, P.C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48, 220–231.

    Article  Google Scholar 

  11. Tchorz, J., & Kollmeier, B. (2003). SNR Estimation based on amplitude modulation analysis with applications to noise suppression. IEEE Transactions on Speech and Audio Proceedings, 11, 184–192.

    Article  Google Scholar 

  12. Assaleh, K.T., & Mammone, R.J. (1994). New LP-derived features for speaker identification. IEEE Transactions on Speech and Audio Proceedings, 2, 630–638.

    Article  Google Scholar 

  13. Zilovic, M.S., Ramachandran, R.P., & Mammone, R.J. (1998). Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions. IEEE Transactions on Speech and Audio Proceedings, 6, 260–267.

    Article  Google Scholar 

  14. Marbach, M., Ondusko, R., Ramachandran, R.P., & Head, L.M. (2009). Neural network classifiers and principal component analysis for blind signal to noise ratio estimation of speech signals, IEEE Int. Symp. on Circuits and Systems, Taipei, Taiwan, 97–100.

  15. Ondusko, R., Marbach, M., McClellan, A., Ramachandran, R.P., Head, L.M., Huggins, M.C., & Smolenski, B.Y. (2006). Blind determination of the signal to noise ratio of speech signals based on estimation combination of multiple features, IEEE Asia Pacific Conf. on Circuits and Systems, Singapore, pp. 1897–1900.

  16. NOISEX Database Examples, Digital Signal Processing Group of Rice University, http://spib.rice.edu/spib/select_noise.html.

  17. Ding, L., Radwan, A., El-Hennawey, M.S., & Goubran, R.A. (2006). Measurement of the effects of temporal clipping on speech quality. IEEE Transactions on Instrumentation and Measurement, 55, 1197–1203.

    Article  Google Scholar 

  18. Cai, L., Tu, R., Zhao, J., & Mao, Y. (2007). Speech quality evaluation: a new application of digital watermarking. IEEE Transactions on Instrumentation and Measurement, 56, 45–55.

    Article  Google Scholar 

  19. Paglierani, P., & Petri, D. (2009). Uncertainty evaluation of objective speech quality measurement in voIP systems. IEEE Transactions on Instrumentation and Measurement, 58, 46–51.

    Article  Google Scholar 

  20. Pan, S.-T., & Li, X.-Y. (2012). An FPGA-based embedded robust speech recognition system designed by combining empirical mode decomposition and a genetic algorithm. IEEE Transactions on Instrumentation and Measurement, 61, 2560–2572.

    Article  Google Scholar 

  21. Kabal, P., & Ramachandran, R.P. (1986). The computation of line spectral frequencies using chebyshev polynomials. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34, 1419–1426.

    Article  Google Scholar 

  22. Linde, Y., Buzo, A., & Gray, R.M. (1980). An algorithm for vector quantizer design. IEEE Transactions on Communications, COM-28, 84–95.

    Article  Google Scholar 

  23. Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21–45.

    Article  Google Scholar 

  24. Kupin, J. (1993). A Wireless simulator (software), CCR-p.

  25. Loizou, P.C. (2013). Speech Enhancement Theory and Practice CRC Press.

Download references

Acknowledgments

This work was supported by the U.S. Air Force Research Laboratory, Rome NY, under contracts FA8750-05-C-0029 and F30602-03-C-0067 and by NSF under grant DUE 1610911.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ravi P. Ramachandran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ondusko, R., Marbach, M., Ramachandran, R.P. et al. Blind Signal-to-Noise Ratio Estimation of Speech Based on Vector Quantizer Classifiers and Decision Level Fusion. J Sign Process Syst 89, 335–345 (2017). https://doi.org/10.1007/s11265-016-1200-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-016-1200-z

Keywords

Navigation