Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

Ganchev, Todor; Potamitis, Ilyas; Fakotakis, Nikos; Kokkinakis, George

doi:10.1023/B:IJST.0000037072.36778.9e

Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

Published: October 2004

Volume 7, pages 281–292, (2004)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Todor Ganchev,
Ilyas Potamitis,
Nikos Fakotakis &
…
George Kokkinakis

78 Accesses
5 Citations
Explore all metrics

Abstract

Investigating Speaker Verification in real-world noisy environments, a novel feature extraction process suitable for suppression of time-varying noise is compared with a fine-tuned spectral subtraction method. The proposed feature extraction process is based on approximating the clean speech and the noise spectral magnitude with a mixture of Gaussian probability density functions (pdfs) by using the Expectation-Maximization algorithm (EM). Subsequently, the Bayesian inference framework is applied to the degraded spectral coefficients, and by employing Minimum Mean Square Error Estimation (MMSE), a closed form solution for the spectral magnitude estimation task is derived. The estimated spectral magnitude finally is incorporated into the Mel-Frequency Cepstral Coefficients (MFCCs) front-end of a baseline text-independent speaker verification system, based on Probabilistic Neural Networks, which participated successfully in the 2002 NIST (National Institute of Standards and Technology of USA) Speaker Recognition Evaluation. A comparative study of the proposed technique for real-world noise types demonstrates a significant performance gain compared to the baseline speech features and to the spectral subtraction enhancement method. Improvements of the absolute speaker verification performance with more than 27% for 0 dB signal-to-noise ratio (SNR), compared to the MFCCs, and with more than 13% for −5 dB SNR, compared to the spectral subtraction version, were obtained in the case of a passing-by aircraft scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Methods for Text-Dependent Speaker Verification

Article 03 May 2019

Neural-Response-Based Text-Dependent Speaker Identification Under Noisy Conditions

Speaker-specific-text based speaker verification system using spectral and phase based features

Article 12 May 2017

References

Assaleh, K.T. and Mammone, R.J. (1994). Robust cepstral feature for speaker identification. Proceedings of the IEEE ICASSP'94. Adelaide, Australia, Vol. 1, pp. 129–132.
Google Scholar
Beaufays, F. and Weintraub, M. (1997). Model transformation for robust speaker recognition from telephone data. Proceedings of the IEEE ICASSP'97. Munich, Germany, Vol. 2, pp. 1063–1066.
Google Scholar
Berouti, M., Schwartz, R., and Makhoul, J. (1979). Enhancement of speech corrupted by acoustic noise. Proceedings of the IEEE ICASSP'79. Washington, DC, USA, Vol. 1, pp. 208–211.
Google Scholar
Boll, S.F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics Speech and Signal Processing, 27:113–120.
Article Google Scholar
Chatzi, I., Fakotakis, N., and Kokkinakis, G. (1997). Greek speech database for creation of voice driven teleservices. Proceedings of the EUROSPEECH'97. Rhodes, Greece, Vol. 4, pp. 1755–1758.
Google Scholar
Demuth, H. and Beale, M. (1998). Neural Networks Toolbox, User's Guide. Version 3, MathWorks, Natick, MA, pp. 6.12–6.14.
Google Scholar
Drygajlo, A. and El-Maliki, M. (1998). Speaker verification in noisy environments with combined spectral subtraction and missing feature theory. Proceedings of the IEEE ICASSP'98. Seattle, WA, USA, Vol. 1, pp.121–124.
Google Scholar
Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002a). Textindependent speaker verification based on probabilistic neural networks. Proceedings of the Acoustics 2002. Patras, Greece. pp. 159–166.
Google Scholar
Ganchev, T., Fakotakis, N., and Kokkinakis, G. (2002b). A speaker verification system based on probabilistic neural networks. 2002 NIST Speaker Recognition Evaluation, Results CDWorkshop Presentations & Final Release of Results. Vienna, Virginia, USA, May 20–22.Available on CD: wkshp docs WCL 1sp.pdf
Google Scholar
Gish, H., Ng, K., and Rohlicek, J.R. (1992). Robust mapping of noisy speech parameters for HMM word spotting. Proceedings of the IEEE ICASSP'92. San Francisco, California, USA, Vol. 2, pp. 109–112.
Google Scholar
Gradshteyn, I. Ryzhik, M., and Jeffrey, A. (Eds.) (1994). Table of Integrals, Series, and Products. 5th ed., San Diego: Academic Press, Eq. 9.247, Eq. 9.254, Eq. 3.462, pp. 1094–1095.
Google Scholar
Hartigan, J.A. and Wong, M.A. (1979). A k-means clustering algorithm. Applied Statistics, 28(1):100–108.
Google Scholar
Hoege, H. (1999). SpeechDat multilingual speech databases for teleservices: Across the finish line. Proceedings of the EUROSPEECH' 99. Budapest, Hungary, Vol.6, pp. 2699–2702.
Google Scholar
Klatt, D.H. (1976). A digital filter bank for spectral matching. Proceedings of the IEEE ICASSP'76. Philadelphia, PA, USA, Vol. 1, pp. 573–576
Google Scholar
Konig, Y., Heck, L., Weintraub, M., and Sonmez, K. (1998). Nonlinear discriminant feature extraction for robust text-independent speaker recognition. Proceedings of the RLA2C-ECSA Speaker Recognition and Its Commercial and Forensic Applications. Avignon, France, April 20-23. pp. 72–75.
Google Scholar
McLachlan, G.J. and Krishnan, T. (1997). The EM algorithm and extensions. Wiley Series in Probability and Statistics. New York: Wiley.
Google Scholar
NIST. (2002). The NIST year 2002 speaker recognition evaluation plan. National Institute of Standards and Technology of USA, February. Available: http://www.nist.gov/speech/tests/spk/2002/ doc/2002-spkrec-evalplan-v60.pdf
Noise samples: URL: http://slt.wcl.ee.upatras.gr/potamitis/index. html
Ortega-Garcia, J. and Gonzales-Rodriguez, J. (1996). Overview of speech enhancement techniques for automatic speaker recognition. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 2, pp. 929–932.
Google Scholar
Potamitis, I., Fakotakis, N., and Kokkinakis, G. (2002). Genderdependent and speaker-dependent speech enhancement. Proceedings of the IEEE ICASSP'2002. Orlando, Florida, USA, Vol. 1, pp. 249–252.
Google Scholar
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., and McGonegal, C.A. (1976). A comparative performance study of several pitch detection algorithms. IEEE Transactions on Acoustics Speech and Signal Processing, 24(5):399–418.
Article Google Scholar
Sankar,A. and Lee, C.H. (1995). Robust speech recognition based on stochastic matching. Proceedings of the IEEE ICASSP'95. Detroit, Michigan, USA, Vol. 1, pp. 121–124.
Google Scholar
Specht, D.F. (1990). Probabilistic neural networks. Neural Networks, 3(1):109–118.
Article Google Scholar
Zhang, X. and Mammone, R.J. (1996). Channel and noise normalization using Affine transformed cepstrum. Proceedings of the ICSLP'96. Philadelphia, PA, USA, Vol. 4, pp. 1993–1996.
Google Scholar

Download references

Authors

Todor Ganchev
View author publications
You can also search for this author in PubMed Google Scholar
Ilyas Potamitis
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar
George Kokkinakis
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ganchev, T., Potamitis, I., Fakotakis, N. et al. Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments. International Journal of Speech Technology 7, 281–292 (2004). https://doi.org/10.1023/B:IJST.0000037072.36778.9e

Download citation

Issue Date: October 2004
DOI: https://doi.org/10.1023/B:IJST.0000037072.36778.9e

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

Abstract

Access this article

Similar content being viewed by others

Robust Methods for Text-Dependent Speaker Verification

Neural-Response-Based Text-Dependent Speaker Identification Under Noisy Conditions

Speaker-specific-text based speaker verification system using spectral and phase based features

References

Rights and permissions

About this article

Cite this article

Navigation

Text-Independent Speaker Verification for Real Fast-Varying Noisy Environments

Abstract

Access this article

Similar content being viewed by others

Robust Methods for Text-Dependent Speaker Verification

Neural-Response-Based Text-Dependent Speaker Identification Under Noisy Conditions

Speaker-specific-text based speaker verification system using spectral and phase based features

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation