Abstract
Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for −5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.
Similar content being viewed by others
References
Assaleh, K.T. and Mammone, R.J. (1994). Robust cepstral feature for speaker identification. Proceedings of the IEEE ICASSP'94. Adelaide, Australia, Vol. 1, pp. 129–132.
Beaufays, F. and Weintraub, M. (1997). Model transformation for robust speaker recognition from telephone data. Proceedings of the IEEE ICASSP'97. Munich, Germany, Vol. 2, pp. 1063–1066.
Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. Proceedings of the IEEE ICASSP'79. Washington, DC, USA, Vol. 1, pp. 208–211.
Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27:113–120.
Chatzi, I., Fakotakis, N., and Kokkinakis, G. (1997). Greek speech database for creation of voice driven teleservices. Proceedings of the EUROSPEECH'97. Rhodes, Greece, Vol. 4, pp. 1755–1758.
Demuth, H. and Beale, M. (1998). Neural Networks Toolbox, User's Guide. Version 3, MathWorks, Natick, MA, pp. 6.12–6.14.
Drygajlo, A. and El-Maliki, M. (1998). Speaker verification in noisy environments with combined spectral subtraction and missing feature theory. Proceedings of the IEEE ICASSP'98. Seattle, WA, USA, Vol. 1, pp.121–124.
Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002a). Textindependent speaker verification based on probabilistic neural networks. Proceedings of the Acoustics 2002. Patras, Greece. pp. 159–166.
Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002b). A speaker verification system based on probabilistic neural networks. 2002 NIST Speaker Recognition Evaluation, Results CDWorkshop Presentations & Final Release of Results. Vienna, Virginia, USA, May 20–22.Available on CD: wkshp docs WCL 1sp.pdf
Gish, H., Ng, K., and Rohlicek, J.R. (1992). Robust mapping of noisy speech parameters for HMM word spotting. Proceedings of the IEEE ICASSP'92. San Francisco, California, USA, Vol. 2, pp. 109–112.
Gradshteyn, I. Ryzhik, M., and Jeffrey, A. (Eds.) (1994). Table of Integrals, Series, and Products. 5th ed., San Diego: Academic Press, Eq. 9.247, Eq. 9.254, Eq. 3.462, pp. 1094–1095.
Hartigan, J.A. and Wong, M.A. (1979). A k-means clustering algorithm. Applied Statistics, 28(1):100–108.
Hoege, H. (1999). SpeechDat multilingual speech databases for teleservices: Across the finish line. Proceedings of the EUROSPEECH' 99. Budapest, Hungary, Vol.6, pp. 2699–2702.
Klatt, D.H. (1976). A digital filter bank for spectral matching. Proceedings of the IEEE ICASSP'76. Philadelphia, PA, USA, Vol. 1, pp. 573–576
Konig, Y., Heck, L., Weintraub, M., and Sonmez, K. (1998). Nonlinear discriminant feature extraction for robust text-independent speaker recognition. Proceedings of the RLA2C-ECSA Speaker Recognition and Its Commercial and Forensic Applications. Avignon, France, April 20-23. pp. 72–75.
McLachlan, G.J. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley Series in Probability and Statistics. New York: Wiley.
NIST. (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA, February. Available: http://www.nist.gov/speech/tests/spk/2002/ doc/2002-spkrec-evalplan-v60.pdf
Noise samples: URL: http://slt.wcl.ee.upatras.gr/potamitis/index. html
Ortega-Garcia, J. and Gonzales-Rodriguez, J. (1996). Overview of speech enhancement techniques for automatic speaker recognition. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 2, pp. 929–932.
Potamitis, I., Fakotakis, N., and Kokkinakis, G. (2002). Genderdependent and speaker-dependent speech enhancement. Proceedings of the IEEE ICASSP'2002. Orlando, Florida, USA, Vol. 1, pp. 249–252.
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics Speech and Signal Processing, 24(5):399–418.
Sankar,A. and Lee, C.H. (1995). Robust speech recognition based on stochastic matching. Proceedings of the IEEE ICASSP'95. Detroit, Michigan, USA, Vol. 1, pp. 121–124.
Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3(1):109–118.
Zhang, X. and Mammone, R.J. (1996). Channel and noise normalization using Affine transformed cepstrum. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 4, pp. 1993–1996.
Rights and permissions
About this article
Cite this article
Ganchev, T., Potamitis, I., Fakotakis, N. et al. Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments. International Journal of Speech Technology 7, 281–292 (2004). https://doi.org/10.1023/B:IJST.0000037072.36778.9e
Issue Date:
DOI: https://doi.org/10.1023/B:IJST.0000037072.36778.9e