Abstract
Speech is often the only available modality to recognize the identity of a person (over the telephone, the radio, in the dark,...). Automatic speaker recognition has been studied for several decades. In this chapter the state of the current text-independant speaker verification research is reviewed. Basic principles of speaker recognition are first summarized. The choice of the speech features and speaker models are mostly related to the individual characteristics (variability) of the speakers’ voices. Besides the speaker’s variability, we are faced with other factors, such as microphone or transmission channel variabilities, that degrade the performances of speaker verification algorithms. Some of these issues are illustrated on recent NIST–2005 and 2006 speaker recognition evaluation campaigns.
The field of speaker verification is also reviewed in relation to speech recognition, focusing on the usage of this new source of information. This relationship has to be seen as an important issue in the development of new services based on speaker and speech recognition. An overview of recent results in this field is given. More particularly, examples of combining baseline Gaussian Mixture Models (GMM) with high-level information extracted with data-driven speech segmentation are reported.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adami, A., Mihaescu, R., Reynolds, D.A., Godfrey, J.J.: Modeling prosodic dynamics for speaker recognition. In: Proc. ICASSP (April 2003)
Andrews, W., Kohler, M., Campbell, J., Godfrey, J.: Phonetic, idiolectal, and acoustic speaker recognition. In: Speaker Odyssey Workshop (2001)
Auckenthaler, R., Carey, M.J., Llyod-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10 (2000)
Auckenthaler, R., Parris, E.S., Carey, M.J.: Improving a GMM speaker verification system by phonetic weighting. In: Proc. ICASSP (1999)
Baker, B., Vogt, R., Sridharan, S.: Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. In: Proc. Eurospeech (September 2005)
Barras, C., Gauvain, J.L.: Feature and score normalization for speaker verification of cellular data. In: Proc. ICASSP (April 2003)
Barras, C., Meignier, S., Gauvain, J.L.: Unsupervised online adaptation for speaker verification over the telephone. In: Proc. Odyssey (2004)
Ben, M., Blouet, R., Bimbot, F.: A monte-carlo method for score normalization in automatic speaker verification using kullback-leibler distances. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)
Bimbot, F., Blomberg, M., Boves, L., Genoud, D., Hutter, H.-P., Jaboulet, C., Koolwaaij, J.W., Lindberg, J., Pierrot, J.-B.: An overview of the cave project research activities in speaker verification. Speech Communication 31, 158–180 (2000)
Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. Eurasip Journal On Applied Signal Processing 4, 430–451 (2004)
Boakye, K., Peskin, B.: Text-constrained speaker recognition on a text-independent task. In: Proc. Odyssey, June 2004, pp. 129–134 (2004)
Bonastre, J.-F., Wilsand, F., Meignier, S.: Alize, a free toolkit for speaker recognition. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2005), vol. 1 (March 2005)
Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1965)
Campbell, J., Reynolds, D.: Corpora for the evaluation of speaker recognition systems. In: Proc. ICASSP (1999)
Campbell, J., Reynolds, D., Dunn, R.: Fusing high- and low level features for speaker recognition. In: Proc. Eurospeech (2003)
Campbell, W., Sturim, D., Reynolds, D.: Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters 13, 5 (2006)
Campbell, W.M., Campbell, J.P., Reynolds, D., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Proc. Neural Information Processing Systems Conference, Vancouver, pp. 361–388 (2003)
Chollet, G., Aversano, G., Dorizzi, B., Petrovska-Delacrétaz, D.: The first biosecure residential workshop. In: 4th International Symposium on Image and Signal Processing and Analysis (ISPA2005), September 2005, pp. 198–212 (2005)
Chollet, G., Černocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In: Ponting, K. (ed.) NATO ASI: Computational models of speech pattern processing, Springer, Heidelberg (1999)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Dehak, N., Chollet, G.: Support vector gmms for speaker verification. In: Proc. Odyssey (June 2006)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc., 1-38 (1977)
Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech, vol. 4, pp. 2517–2520 (2001)
Dong, X., Zhaohui, W.: Speaker recognition using continuous density support vector machines. Electronics Letters 37(17) (2001)
Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. In: Proc. ICASSP, vol. 1, pp. 133–136 (1994)
Egan, J.: Signal detection theory and ROC analysis. Academic Press, London (1975)
El Hannani, A., Petrovska-Delacrétaz, D.: Segmental score fusion for alisp-based gmm text-independent speaker verification. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds.) Advances in Nonlinear Speech Processing and Applications, pp. 385–394 (2004)
El Hannani, A., Petrovska-Delacrétaz, D.: Exploiting High-Level Information Provided by ALISP in Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 66–71. Springer, Heidelberg (2006)
El Hannani, A., Petrovska-Delacrétaz, D.: Improving Speaker Verification Using ALISP-Based Specific GMMs. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 580–587. Springer, Heidelberg (2005)
El Hannani, A., Toledano, D.T., Petrovska-Delacrétaz, D., Montero-Asenjo, A., Hennebert, J.: Using data-driven and phonetic units for speaker verification. In: Proc. of ODYSSEY06, The Speaker and Language Recognition Workshop, 28-30 June (2006)
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1970)
Ferrer, L., Sönmez, K., Kajarekar, S.: Class-dependent score combination for speaker recognition. In: Proc. Interspeech (September 2005)
Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2), 254–272 (1981)
Furui, S.: Comparison of speaker recognition methods using static features and dynamic features. IEEE Transactions on Acoustics, Speech and Signal Processing 29(3), 342–350 (1981)
Garcia-Romero, D., Fierrez-Aguilar, J., Ortega-Garcia, J., Gonzalez-Rodriguez, J.: Support vector machine fusion of idiolectal and acoustic speaker information in spanish conversational speech. In: Proc. ICASSP (April 2003)
Gauvain, J.L., Lee, C.-H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process 29, 291–298 (1994)
Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: ESCA Workshop on Automatic Speaker Recognition Identification and Verification, April 1994, pp. 39–42 (1994)
Gutman, D., Bistritz, Y.: Speaker verification using phoneme-adapted gaussian mixture models. In: Proc. EUSIPCO (2002)
Hansen, E.G., Slyh, R.E., Anderson, T.R.: Speaker recognition using phoneme-specific GMMs. In: Proc. Odyssey (2004)
Hatch, A.O., Peskin, B., Stolcke, A.: Improved phonetic speaker recognition using lattice decoding. In: Proc. ICASSP (March 2005)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)
Hébert, M., Heck, L.P.: Phonetic class-based speaker verification. In: Proc. Eurospeech (2003)
Hermansky, H.: Perceptual linear prediction (plp) analysis of speech. Journal of the Acoustical Society of America 87(4) (1990)
Hermansky, H.: Rasta processing of speech. IEEE Trans. on Speech and Audio Processing 2(4) (1994)
Kajarekar, S.S., Hermansky, H.: Speaker verification based on broad phonetic categories. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)
Kenny, P., Dumouchel, P.: Experiments in speaker verification using factor analysis likelihood ratios. In: Proceedings of Odyssey04 - Speaker and Language Recognition Workshop, May 31 - June 3 (2004)
Kenny, P., Dumouchel, P.: Disentangling speaker and channel effects in speaker verification. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2004), vol. 1 (May 2005)
Kharroubi, J., Petrovska-Delacréraz, D., Chollet, G.: Combining GMM’s with support vector machines for text-independent speaker verification. In: Proc. Eurospeech, pp. 1761–1764 (2001)
Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)
Klusacek, D., Navratil, J., Reynolds, D.A., Campbell, J.: Conditional pronunciation modeling in speaker detection. In: Proc. ICASSP (April 2003)
Koolwaaij, J., Boves, L.: Local normalization and delayed decision making in speaker detection and tracking. Digital Signal Processing 10 (2000)
Koolwaaij, J., de Veth, J.: The use of broad phonetic class models in speaker recognition. In: Proc. ICSLP (1998)
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Language 9, 171–185 (1995)
Li, K.-P., Porter, J.E.: Normalizations and selection of speech segments for speaker recognition scoring. In: Proc. ICASSP, vol. 1, pp. 595–598 (1988)
Lindberg, J., Koolwaaij, J., Hutter, H., Genoud, D., Blomberg, M., Bimbot, F., Pierrot, J.: Techniques for a priori decision threshold estimation in speaker verification. In: Speaker Verification Proceedings RLA2C, Avignon (1998)
Ma, C., Chang, E.: Comparaison of discriminative training methods for speaker verification. In: Proc. ICASSP (April 2003)
Magrin-Chagnolleau, I., Gravier, G., Blouet, R.: Overview of the 2000-2001 elisa consortium research activities. In: Speaker Odyssey Workshop (June 2001)
Makhoul, J.: Linear prediction: A tutorial review. Proceedings of the IEEE 63(4), 561–580 (1975)
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The det curve in assessment of detection task performance. In: Proc. Eurospeech’97, vol. 4, pp. 1895–1898 (1997)
Matsui, T., Furui, S.: Concatenated phoneme models for text-variable speaker recognition. In: Proc. ICASSP, pp. 133–136 (1993)
Naik, J.M., Lubensky, D.: A hybrid hmm-mlp speaker verification algorithm for telephone speech. In: Proc. ICASSP (1994)
Naik, J.M.: Speaker verification: A tutorial. IEEE Commun. Magazine, 42–48 (Jan. 1990)
Navratil, J., Ramaswamy, G.N.: The awe and mystery of t-norm. In: Proc. Eurospeech’03 (2003)
Nordström, T., Melin, H., Lindberg, J.: A comparative study of speaker verification systems using the polycost database. In: Proc. ICSLP (December 1998)
Olsen, J.: A two-stage procedure for phone based speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 219–226. Springer, Heidelberg (1997)
Oppenheim, A.V., Schafer, R.W.: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 16(2), 221–226 (1968)
Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing. Prentice-Hall, Englewood Cliffs (1989)
Parris, E.S., Carey, M.J.: Discriminative phonemes for speaker identification. In: ICLSP, pp. 1843–1846 (1994)
Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)
Petrovska-Delacrétaz, D., Abalo, M., El-Hannani, A., Chollet, G.: Data-driven speech segmentation for speaker verification and language identification. In: ITRW Non Linear Speech Processing (NOLISP 03), May 20-23 (2003)
Petrovska-Delacrétaz, D., Gorin, A.L., Wright, J.H., Riccardi, G.: Detecting acoustic morphemes in lattices for speken landuage understanding. In: ICSLP (October 2000)
Petrovska-Delacretaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLP’s. In: Proc. ICASSP, pp. 777–780 (1998)
Petrovska-Delacrétaz, D., Černocký, J., Chollet, G.: Segmental approaches for automatic speaker verification. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 198–212 (2000)
Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Text-independent speaker verification using automatically labeled acoustic segments. In: ICLSP (1998)
Picone, J.: Signal modeling techniques in speech recognition. Proceedings of the IEEE 81(9), 1214–1247 (1993)
Preti, A., Scheffer, N., Bonastre, J.-F.: Discriminant approches for gmm based speaker detection systems. In: The workshop on Multimodal User Authentication (2006)
Quatieri, T.F.: Speech Signal Processing. Prentice Hall Signal Processing Series. Prentice-Hall, Englewood Cliffs (2002)
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, J., Xiang, B.: The supersid project: Exploiting high-level information for high-accuracy speaker recognition. In: Proc. ICASSP (April 2003)
Reynolds, D.A.: A gaussian mixture modeling approach to text-independent speaker identification. Ph.D. Thesis, Georgia Institute of Technology (1992)
Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2(3), 639–643 (1994)
Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. Lincoln Lab. Journal 8(2), 173–191 (1995)
Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 19–41 (2000)
Rosenberg, A.E., DeLong, J., Lee, C.H., Juang, B.H., Soong, F.K.: The use of cohort normalized scores for speaker verification. In: International Conference on Speech and Language Processing, November 1992, pp. 599–602 (1992)
Schmidt, M., Gish, H.: Speaker identification via support vector machines. In: Proc. ICASSP (1996)
Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2001)
Solewicz, Y.A., Koppel, M.: Enhanced fusion methods for speaker verification. In: Proc. SPECOM’2004: 9th Conference Speech and Computer, pp. 388–392 (2004)
Solomonoff, A., Quillen, C., Campbell, W.: Channel compensation for svm speaker recognition. In: Proc. Odyssey (2004)
Sönmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling dynamic prosodic variation for speaker verification. In: Proc. ICSLP98 (1998)
Sturim, D., Reynolds, D., Dunn, R., Quatieri, T.: Speaker verification using text-constrained gaussian mixture models. In: Proc. ICASSP, vol. 1, pp. 677–680 (2002)
Tishby, N.: On the application of mixture ar hidden markov models to text independent speaker recognition. IEEE Transactions on Signial Processing 39(3), 563–570 (1991)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)
Wan, V., Renals, S.: SVMSVM: Support vector machine speaker verification methodology. In: Proc. IEEE ICASSP, vol. 2, pp. 221–224 (2003)
Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Trans. on Speech and Audio Processing 13, 203–210 (2005)
Xiang, B., Chaudhari, U., Navratil, J., Ramaswamy, G., Gopinath, R.: Short-time gaussianization for robust speaker verification. In: Proc. ICASSP, vol. 1, pp. 681–684 (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Petrovska-Delacrétaz, D., El Hannani, A., Chollet, G. (2007). Text-Independent Speaker Verification: State of the Art and Challenges. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-71505-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71503-0
Online ISBN: 978-3-540-71505-4
eBook Packages: Computer ScienceComputer Science (R0)