Text-Independent Speaker Verification: State of the Art and Challenges

Petrovska-Delacrétaz, Dijana; El Hannani, Asmaa; Chollet, Gérard

doi:10.1007/978-3-540-71505-4_9

Text-Independent Speaker Verification: State of the Art and Challenges

Dijana Petrovska-Delacrétaz¹,
Asmaa El Hannani^1,2 &
Gérard Chollet³

Chapter

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

Speech is often the only available modality to recognize the identity of a person (over the telephone, the radio, in the dark,...). Automatic speaker recognition has been studied for several decades. In this chapter the state of the current text-independant speaker verification research is reviewed. Basic principles of speaker recognition are first summarized. The choice of the speech features and speaker models are mostly related to the individual characteristics (variability) of the speakers’ voices. Besides the speaker’s variability, we are faced with other factors, such as microphone or transmission channel variabilities, that degrade the performances of speaker verification algorithms. Some of these issues are illustrated on recent NIST–2005 and 2006 speaker recognition evaluation campaigns.

The field of speaker verification is also reviewed in relation to speech recognition, focusing on the usage of this new source of information. This relationship has to be seen as an important issue in the development of new services based on speaker and speech recognition. An overview of recent results in this field is given. More particularly, examples of combining baseline Gaussian Mixture Models (GMM) with high-level information extracted with data-driven speech segmentation are reported.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adami, A., Mihaescu, R., Reynolds, D.A., Godfrey, J.J.: Modeling prosodic dynamics for speaker recognition. In: Proc. ICASSP (April 2003)
Google Scholar
Andrews, W., Kohler, M., Campbell, J., Godfrey, J.: Phonetic, idiolectal, and acoustic speaker recognition. In: Speaker Odyssey Workshop (2001)
Google Scholar
Auckenthaler, R., Carey, M.J., Llyod-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10 (2000)
Google Scholar
Auckenthaler, R., Parris, E.S., Carey, M.J.: Improving a GMM speaker verification system by phonetic weighting. In: Proc. ICASSP (1999)
Google Scholar
Baker, B., Vogt, R., Sridharan, S.: Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. In: Proc. Eurospeech (September 2005)
Google Scholar
Barras, C., Gauvain, J.L.: Feature and score normalization for speaker verification of cellular data. In: Proc. ICASSP (April 2003)
Google Scholar
Barras, C., Meignier, S., Gauvain, J.L.: Unsupervised online adaptation for speaker verification over the telephone. In: Proc. Odyssey (2004)
Google Scholar
Ben, M., Blouet, R., Bimbot, F.: A monte-carlo method for score normalization in automatic speaker verification using kullback-leibler distances. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Google Scholar
Bimbot, F., Blomberg, M., Boves, L., Genoud, D., Hutter, H.-P., Jaboulet, C., Koolwaaij, J.W., Lindberg, J., Pierrot, J.-B.: An overview of the cave project research activities in speaker verification. Speech Communication 31, 158–180 (2000)
Article Google Scholar
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. Eurasip Journal On Applied Signal Processing 4, 430–451 (2004)
Article Google Scholar
Boakye, K., Peskin, B.: Text-constrained speaker recognition on a text-independent task. In: Proc. Odyssey, June 2004, pp. 129–134 (2004)
Google Scholar
Bonastre, J.-F., Wilsand, F., Meignier, S.: Alize, a free toolkit for speaker recognition. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2005), vol. 1 (March 2005)
Google Scholar
Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1965)
MATH Google Scholar
Campbell, J., Reynolds, D.: Corpora for the evaluation of speaker recognition systems. In: Proc. ICASSP (1999)
Google Scholar
Campbell, J., Reynolds, D., Dunn, R.: Fusing high- and low level features for speaker recognition. In: Proc. Eurospeech (2003)
Google Scholar
Campbell, W., Sturim, D., Reynolds, D.: Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters 13, 5 (2006)
Article Google Scholar
Campbell, W.M., Campbell, J.P., Reynolds, D., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Proc. Neural Information Processing Systems Conference, Vancouver, pp. 361–388 (2003)
Google Scholar
Chollet, G., Aversano, G., Dorizzi, B., Petrovska-Delacrétaz, D.: The first biosecure residential workshop. In: 4th International Symposium on Image and Signal Processing and Analysis (ISPA2005), September 2005, pp. 198–212 (2005)
Google Scholar
Chollet, G., Černocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In: Ponting, K. (ed.) NATO ASI: Computational models of speech pattern processing, Springer, Heidelberg (1999)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Dehak, N., Chollet, G.: Support vector gmms for speaker verification. In: Proc. Odyssey (June 2006)
Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc., 1-38 (1977)
Google Scholar
Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech, vol. 4, pp. 2517–2520 (2001)
Google Scholar
Dong, X., Zhaohui, W.: Speaker recognition using continuous density support vector machines. Electronics Letters 37(17) (2001)
Google Scholar
Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. In: Proc. ICASSP, vol. 1, pp. 133–136 (1994)
Google Scholar
Egan, J.: Signal detection theory and ROC analysis. Academic Press, London (1975)
Google Scholar
El Hannani, A., Petrovska-Delacrétaz, D.: Segmental score fusion for alisp-based gmm text-independent speaker verification. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds.) Advances in Nonlinear Speech Processing and Applications, pp. 385–394 (2004)
Google Scholar
El Hannani, A., Petrovska-Delacrétaz, D.: Exploiting High-Level Information Provided by ALISP in Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 66–71. Springer, Heidelberg (2006)
Chapter Google Scholar
El Hannani, A., Petrovska-Delacrétaz, D.: Improving Speaker Verification Using ALISP-Based Specific GMMs. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 580–587. Springer, Heidelberg (2005)
Google Scholar
El Hannani, A., Toledano, D.T., Petrovska-Delacrétaz, D., Montero-Asenjo, A., Hennebert, J.: Using data-driven and phonetic units for speaker verification. In: Proc. of ODYSSEY06, The Speaker and Language Recognition Workshop, 28-30 June (2006)
Google Scholar
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1970)
Google Scholar
Ferrer, L., Sönmez, K., Kajarekar, S.: Class-dependent score combination for speaker recognition. In: Proc. Interspeech (September 2005)
Google Scholar
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2), 254–272 (1981)
Article Google Scholar
Furui, S.: Comparison of speaker recognition methods using static features and dynamic features. IEEE Transactions on Acoustics, Speech and Signal Processing 29(3), 342–350 (1981)
Article Google Scholar
Garcia-Romero, D., Fierrez-Aguilar, J., Ortega-Garcia, J., Gonzalez-Rodriguez, J.: Support vector machine fusion of idiolectal and acoustic speaker information in spanish conversational speech. In: Proc. ICASSP (April 2003)
Google Scholar
Gauvain, J.L., Lee, C.-H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process 29, 291–298 (1994)
Article Google Scholar
Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: ESCA Workshop on Automatic Speaker Recognition Identification and Verification, April 1994, pp. 39–42 (1994)
Google Scholar
Gutman, D., Bistritz, Y.: Speaker verification using phoneme-adapted gaussian mixture models. In: Proc. EUSIPCO (2002)
Google Scholar
Hansen, E.G., Slyh, R.E., Anderson, T.R.: Speaker recognition using phoneme-specific GMMs. In: Proc. Odyssey (2004)
Google Scholar
Hatch, A.O., Peskin, B., Stolcke, A.: Improved phonetic speaker recognition using lattice decoding. In: Proc. ICASSP (March 2005)
Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
MATH Google Scholar
Hébert, M., Heck, L.P.: Phonetic class-based speaker verification. In: Proc. Eurospeech (2003)
Google Scholar
Hermansky, H.: Perceptual linear prediction (plp) analysis of speech. Journal of the Acoustical Society of America 87(4) (1990)
Google Scholar
Hermansky, H.: Rasta processing of speech. IEEE Trans. on Speech and Audio Processing 2(4) (1994)
Google Scholar
Kajarekar, S.S., Hermansky, H.: Speaker verification based on broad phonetic categories. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)
Google Scholar
Kenny, P., Dumouchel, P.: Experiments in speaker verification using factor analysis likelihood ratios. In: Proceedings of Odyssey04 - Speaker and Language Recognition Workshop, May 31 - June 3 (2004)
Google Scholar
Kenny, P., Dumouchel, P.: Disentangling speaker and channel effects in speaker verification. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2004), vol. 1 (May 2005)
Google Scholar
Kharroubi, J., Petrovska-Delacréraz, D., Chollet, G.: Combining GMM’s with support vector machines for text-independent speaker verification. In: Proc. Eurospeech, pp. 1761–1764 (2001)
Google Scholar
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Article Google Scholar
Klusacek, D., Navratil, J., Reynolds, D.A., Campbell, J.: Conditional pronunciation modeling in speaker detection. In: Proc. ICASSP (April 2003)
Google Scholar
Koolwaaij, J., Boves, L.: Local normalization and delayed decision making in speaker detection and tracking. Digital Signal Processing 10 (2000)
Google Scholar
Koolwaaij, J., de Veth, J.: The use of broad phonetic class models in speaker recognition. In: Proc. ICSLP (1998)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Language 9, 171–185 (1995)
Article Google Scholar
Li, K.-P., Porter, J.E.: Normalizations and selection of speech segments for speaker recognition scoring. In: Proc. ICASSP, vol. 1, pp. 595–598 (1988)
Google Scholar
Lindberg, J., Koolwaaij, J., Hutter, H., Genoud, D., Blomberg, M., Bimbot, F., Pierrot, J.: Techniques for a priori decision threshold estimation in speaker verification. In: Speaker Verification Proceedings RLA2C, Avignon (1998)
Google Scholar
Ma, C., Chang, E.: Comparaison of discriminative training methods for speaker verification. In: Proc. ICASSP (April 2003)
Google Scholar
Magrin-Chagnolleau, I., Gravier, G., Blouet, R.: Overview of the 2000-2001 elisa consortium research activities. In: Speaker Odyssey Workshop (June 2001)
Google Scholar
Makhoul, J.: Linear prediction: A tutorial review. Proceedings of the IEEE 63(4), 561–580 (1975)
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The det curve in assessment of detection task performance. In: Proc. Eurospeech’97, vol. 4, pp. 1895–1898 (1997)
Google Scholar
Matsui, T., Furui, S.: Concatenated phoneme models for text-variable speaker recognition. In: Proc. ICASSP, pp. 133–136 (1993)
Google Scholar
Naik, J.M., Lubensky, D.: A hybrid hmm-mlp speaker verification algorithm for telephone speech. In: Proc. ICASSP (1994)
Google Scholar
Naik, J.M.: Speaker verification: A tutorial. IEEE Commun. Magazine, 42–48 (Jan. 1990)
Google Scholar
Navratil, J., Ramaswamy, G.N.: The awe and mystery of t-norm. In: Proc. Eurospeech’03 (2003)
Google Scholar
Nordström, T., Melin, H., Lindberg, J.: A comparative study of speaker verification systems using the polycost database. In: Proc. ICSLP (December 1998)
Google Scholar
Olsen, J.: A two-stage procedure for phone based speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 219–226. Springer, Heidelberg (1997)
Chapter Google Scholar
Oppenheim, A.V., Schafer, R.W.: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 16(2), 221–226 (1968)
Article Google Scholar
Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing. Prentice-Hall, Englewood Cliffs (1989)
MATH Google Scholar
Parris, E.S., Carey, M.J.: Discriminative phonemes for speaker identification. In: ICLSP, pp. 1843–1846 (1994)
Google Scholar
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)
Google Scholar
Petrovska-Delacrétaz, D., Abalo, M., El-Hannani, A., Chollet, G.: Data-driven speech segmentation for speaker verification and language identification. In: ITRW Non Linear Speech Processing (NOLISP 03), May 20-23 (2003)
Google Scholar
Petrovska-Delacrétaz, D., Gorin, A.L., Wright, J.H., Riccardi, G.: Detecting acoustic morphemes in lattices for speken landuage understanding. In: ICSLP (October 2000)
Google Scholar
Petrovska-Delacretaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLP’s. In: Proc. ICASSP, pp. 777–780 (1998)
Google Scholar
Petrovska-Delacrétaz, D., Černocký, J., Chollet, G.: Segmental approaches for automatic speaker verification. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 198–212 (2000)
Google Scholar
Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Text-independent speaker verification using automatically labeled acoustic segments. In: ICLSP (1998)
Google Scholar
Picone, J.: Signal modeling techniques in speech recognition. Proceedings of the IEEE 81(9), 1214–1247 (1993)
Article Google Scholar
Preti, A., Scheffer, N., Bonastre, J.-F.: Discriminant approches for gmm based speaker detection systems. In: The workshop on Multimodal User Authentication (2006)
Google Scholar
Quatieri, T.F.: Speech Signal Processing. Prentice Hall Signal Processing Series. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, J., Xiang, B.: The supersid project: Exploiting high-level information for high-accuracy speaker recognition. In: Proc. ICASSP (April 2003)
Google Scholar
Reynolds, D.A.: A gaussian mixture modeling approach to text-independent speaker identification. Ph.D. Thesis, Georgia Institute of Technology (1992)
Google Scholar
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2(3), 639–643 (1994)
Article Google Scholar
Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. Lincoln Lab. Journal 8(2), 173–191 (1995)
Google Scholar
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)
Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 19–41 (2000)
Google Scholar
Rosenberg, A.E., DeLong, J., Lee, C.H., Juang, B.H., Soong, F.K.: The use of cohort normalized scores for speaker verification. In: International Conference on Speech and Language Processing, November 1992, pp. 599–602 (1992)
Google Scholar
Schmidt, M., Gish, H.: Speaker identification via support vector machines. In: Proc. ICASSP (1996)
Google Scholar
Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2001)
Google Scholar
Solewicz, Y.A., Koppel, M.: Enhanced fusion methods for speaker verification. In: Proc. SPECOM’2004: 9th Conference Speech and Computer, pp. 388–392 (2004)
Google Scholar
Solomonoff, A., Quillen, C., Campbell, W.: Channel compensation for svm speaker recognition. In: Proc. Odyssey (2004)
Google Scholar
Sönmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling dynamic prosodic variation for speaker verification. In: Proc. ICSLP98 (1998)
Google Scholar
Sturim, D., Reynolds, D., Dunn, R., Quatieri, T.: Speaker verification using text-constrained gaussian mixture models. In: Proc. ICASSP, vol. 1, pp. 677–680 (2002)
Google Scholar
Tishby, N.: On the application of mixture ar hidden markov models to text independent speaker recognition. IEEE Transactions on Signial Processing 39(3), 563–570 (1991)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
MATH Google Scholar
Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)
Article Google Scholar
Wan, V., Renals, S.: SVMSVM: Support vector machine speaker verification methodology. In: Proc. IEEE ICASSP, vol. 2, pp. 221–224 (2003)
Google Scholar
Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Trans. on Speech and Audio Processing 13, 203–210 (2005)
Article Google Scholar
Xiang, B., Chaudhari, U., Navratil, J., Ramaswamy, G., Gopinath, R.: Short-time gaussianization for robust speaker verification. In: Proc. ICASSP, vol. 1, pp. 681–684 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut National des Télécommunications, 91011 Evry, France
Dijana Petrovska-Delacrétaz & Asmaa El Hannani
DIVA Group, Informatics Dept., University of Fribourg, Switzerland
Asmaa El Hannani
TSI Department, CNRS-LTCI ENST, Paris, France
Gérard Chollet

Authors

Dijana Petrovska-Delacrétaz
View author publications
You can also search for this author in PubMed Google Scholar
Asmaa El Hannani
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Petrovska-Delacrétaz, D., El Hannani, A., Chollet, G. (2007). Text-Independent Speaker Verification: State of the Art and Challenges. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-71505-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics