Skip to main content
Log in

Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for −5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Assaleh, K.T. and Mammone, R.J. (1994). Robust cepstral feature for speaker identification. Proceedings of the IEEE ICASSP'94. Adelaide, Australia, Vol. 1, pp. 129–132.

    Google Scholar 

  • Beaufays, F. and Weintraub, M. (1997). Model transformation for robust speaker recognition from telephone data. Proceedings of the IEEE ICASSP'97. Munich, Germany, Vol. 2, pp. 1063–1066.

    Google Scholar 

  • Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. Proceedings of the IEEE ICASSP'79. Washington, DC, USA, Vol. 1, pp. 208–211.

    Google Scholar 

  • Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27:113–120.

    Article  Google Scholar 

  • Chatzi, I., Fakotakis, N., and Kokkinakis, G. (1997). Greek speech database for creation of voice driven teleservices. Proceedings of the EUROSPEECH'97. Rhodes, Greece, Vol. 4, pp. 1755–1758.

    Google Scholar 

  • Demuth, H. and Beale, M. (1998). Neural Networks Toolbox, User's Guide. Version 3, MathWorks, Natick, MA, pp. 6.12–6.14.

    Google Scholar 

  • Drygajlo, A. and El-Maliki, M. (1998). Speaker verification in noisy environments with combined spectral subtraction and missing feature theory. Proceedings of the IEEE ICASSP'98. Seattle, WA, USA, Vol. 1, pp.121–124.

    Google Scholar 

  • Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002a). Textindependent speaker verification based on probabilistic neural networks. Proceedings of the Acoustics 2002. Patras, Greece. pp. 159–166.

    Google Scholar 

  • Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002b). A speaker verification system based on probabilistic neural networks. 2002 NIST Speaker Recognition Evaluation, Results CDWorkshop Presentations & Final Release of Results. Vienna, Virginia, USA, May 20–22.Available on CD: wkshp docs WCL 1sp.pdf

    Google Scholar 

  • Gish, H., Ng, K., and Rohlicek, J.R. (1992). Robust mapping of noisy speech parameters for HMM word spotting. Proceedings of the IEEE ICASSP'92. San Francisco, California, USA, Vol. 2, pp. 109–112.

    Google Scholar 

  • Gradshteyn, I. Ryzhik, M., and Jeffrey, A. (Eds.) (1994). Table of Integrals, Series, and Products. 5th ed., San Diego: Academic Press, Eq. 9.247, Eq. 9.254, Eq. 3.462, pp. 1094–1095.

    Google Scholar 

  • Hartigan, J.A. and Wong, M.A. (1979). A k-means clustering algorithm. Applied Statistics, 28(1):100–108.

    Google Scholar 

  • Hoege, H. (1999). SpeechDat multilingual speech databases for teleservices: Across the finish line. Proceedings of the EUROSPEECH' 99. Budapest, Hungary, Vol.6, pp. 2699–2702.

    Google Scholar 

  • Klatt, D.H. (1976). A digital filter bank for spectral matching. Proceedings of the IEEE ICASSP'76. Philadelphia, PA, USA, Vol. 1, pp. 573–576

    Google Scholar 

  • Konig, Y., Heck, L., Weintraub, M., and Sonmez, K. (1998). Nonlinear discriminant feature extraction for robust text-independent speaker recognition. Proceedings of the RLA2C-ECSA Speaker Recognition and Its Commercial and Forensic Applications. Avignon, France, April 20-23. pp. 72–75.

    Google Scholar 

  • McLachlan, G.J. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley Series in Probability and Statistics. New York: Wiley.

    Google Scholar 

  • NIST. (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA, February. Available: http://www.nist.gov/speech/tests/spk/2002/ doc/2002-spkrec-evalplan-v60.pdf

  • Noise samples: URL: http://slt.wcl.ee.upatras.gr/potamitis/index. html

  • Ortega-Garcia, J. and Gonzales-Rodriguez, J. (1996). Overview of speech enhancement techniques for automatic speaker recognition. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 2, pp. 929–932.

    Google Scholar 

  • Potamitis, I., Fakotakis, N., and Kokkinakis, G. (2002). Genderdependent and speaker-dependent speech enhancement. Proceedings of the IEEE ICASSP'2002. Orlando, Florida, USA, Vol. 1, pp. 249–252.

    Google Scholar 

  • Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics Speech and Signal Processing, 24(5):399–418.

    Article  Google Scholar 

  • Sankar,A. and Lee, C.H. (1995). Robust speech recognition based on stochastic matching. Proceedings of the IEEE ICASSP'95. Detroit, Michigan, USA, Vol. 1, pp. 121–124.

    Google Scholar 

  • Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3(1):109–118.

    Article  Google Scholar 

  • Zhang, X. and Mammone, R.J. (1996). Channel and noise normalization using Affine transformed cepstrum. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 4, pp. 1993–1996.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ganchev, T., Potamitis, I., Fakotakis, N. et al. Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments. International Journal of Speech Technology 7, 281–292 (2004). https://doi.org/10.1023/B:IJST.0000037072.36778.9e

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:IJST.0000037072.36778.9e

Navigation