Skip to main content

Text-Independent Speaker Verification: State of the Art and Challenges

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4391))

Abstract

Speech is often the only available modality to recognize the identity of a person (over the telephone, the radio, in the dark,...). Automatic speaker recognition has been studied for several decades. In this chapter the state of the current text-independant speaker verification research is reviewed. Basic principles of speaker recognition are first summarized. The choice of the speech features and speaker models are mostly related to the individual characteristics (variability) of the speakers’ voices. Besides the speaker’s variability, we are faced with other factors, such as microphone or transmission channel variabilities, that degrade the performances of speaker verification algorithms. Some of these issues are illustrated on recent NIST–2005 and 2006 speaker recognition evaluation campaigns.

The field of speaker verification is also reviewed in relation to speech recognition, focusing on the usage of this new source of information. This relationship has to be seen as an important issue in the development of new services based on speaker and speech recognition. An overview of recent results in this field is given. More particularly, examples of combining baseline Gaussian Mixture Models (GMM) with high-level information extracted with data-driven speech segmentation are reported.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adami, A., Mihaescu, R., Reynolds, D.A., Godfrey, J.J.: Modeling prosodic dynamics for speaker recognition. In: Proc. ICASSP (April 2003)

    Google Scholar 

  2. Andrews, W., Kohler, M., Campbell, J., Godfrey, J.: Phonetic, idiolectal, and acoustic speaker recognition. In: Speaker Odyssey Workshop (2001)

    Google Scholar 

  3. Auckenthaler, R., Carey, M.J., Llyod-Thomas, H.: Score normalization for text-independent speaker verification systems. Digital Signal Processing 10 (2000)

    Google Scholar 

  4. Auckenthaler, R., Parris, E.S., Carey, M.J.: Improving a GMM speaker verification system by phonetic weighting. In: Proc. ICASSP (1999)

    Google Scholar 

  5. Baker, B., Vogt, R., Sridharan, S.: Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification. In: Proc. Eurospeech (September 2005)

    Google Scholar 

  6. Barras, C., Gauvain, J.L.: Feature and score normalization for speaker verification of cellular data. In: Proc. ICASSP (April 2003)

    Google Scholar 

  7. Barras, C., Meignier, S., Gauvain, J.L.: Unsupervised online adaptation for speaker verification over the telephone. In: Proc. Odyssey (2004)

    Google Scholar 

  8. Ben, M., Blouet, R., Bimbot, F.: A monte-carlo method for score normalization in automatic speaker verification using kullback-leibler distances. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2002)

    Google Scholar 

  9. Bimbot, F., Blomberg, M., Boves, L., Genoud, D., Hutter, H.-P., Jaboulet, C., Koolwaaij, J.W., Lindberg, J., Pierrot, J.-B.: An overview of the cave project research activities in speaker verification. Speech Communication 31, 158–180 (2000)

    Article  Google Scholar 

  10. Bimbot, F., Bonastre, J.F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S., Merlin, T., Ortega-Garcia, J., Petrovska-Delacretaz, D., Reynolds, D.A.: A tutorial on text-independent speaker verification. Eurasip Journal On Applied Signal Processing 4, 430–451 (2004)

    Article  Google Scholar 

  11. Boakye, K., Peskin, B.: Text-constrained speaker recognition on a text-independent task. In: Proc. Odyssey, June 2004, pp. 129–134 (2004)

    Google Scholar 

  12. Bonastre, J.-F., Wilsand, F., Meignier, S.: Alize, a free toolkit for speaker recognition. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2005), vol. 1 (March 2005)

    Google Scholar 

  13. Bracewell, R.N.: The Fourier Transform and Its Applications. McGraw-Hill, New York (1965)

    MATH  Google Scholar 

  14. Campbell, J., Reynolds, D.: Corpora for the evaluation of speaker recognition systems. In: Proc. ICASSP (1999)

    Google Scholar 

  15. Campbell, J., Reynolds, D., Dunn, R.: Fusing high- and low level features for speaker recognition. In: Proc. Eurospeech (2003)

    Google Scholar 

  16. Campbell, W., Sturim, D., Reynolds, D.: Support vector machines using gmm supervectors for speaker verification. IEEE Signal Processing Letters 13, 5 (2006)

    Article  Google Scholar 

  17. Campbell, W.M., Campbell, J.P., Reynolds, D., Jones, D.A., Leek, T.R.: Phonetic speaker recognition with support vector machines. In: Proc. Neural Information Processing Systems Conference, Vancouver, pp. 361–388 (2003)

    Google Scholar 

  18. Chollet, G., Aversano, G., Dorizzi, B., Petrovska-Delacrétaz, D.: The first biosecure residential workshop. In: 4th International Symposium on Image and Signal Processing and Analysis (ISPA2005), September 2005, pp. 198–212 (2005)

    Google Scholar 

  19. Chollet, G., Černocký, J., Constantinescu, A., Deligne, S., Bimbot, F.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In: Ponting, K. (ed.) NATO ASI: Computational models of speech pattern processing, Springer, Heidelberg (1999)

    Google Scholar 

  20. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)

    Article  Google Scholar 

  21. Dehak, N., Chollet, G.: Support vector gmms for speaker verification. In: Proc. Odyssey (June 2006)

    Google Scholar 

  22. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc., 1-38 (1977)

    Google Scholar 

  23. Doddington, G.: Speaker recognition based on idiolectal differences between speakers. In: Eurospeech, vol. 4, pp. 2517–2520 (2001)

    Google Scholar 

  24. Dong, X., Zhaohui, W.: Speaker recognition using continuous density support vector machines. Electronics Letters 37(17) (2001)

    Google Scholar 

  25. Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. In: Proc. ICASSP, vol. 1, pp. 133–136 (1994)

    Google Scholar 

  26. Egan, J.: Signal detection theory and ROC analysis. Academic Press, London (1975)

    Google Scholar 

  27. El Hannani, A., Petrovska-Delacrétaz, D.: Segmental score fusion for alisp-based gmm text-independent speaker verification. In: Chollet, G., Esposito, A., Faundez-Zanuy, M., Marinaro, M. (eds.) Advances in Nonlinear Speech Processing and Applications, pp. 385–394 (2004)

    Google Scholar 

  28. El Hannani, A., Petrovska-Delacrétaz, D.: Exploiting High-Level Information Provided by ALISP in Speaker Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 66–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  29. El Hannani, A., Petrovska-Delacrétaz, D.: Improving Speaker Verification Using ALISP-Based Specific GMMs. In: Kanade, T., Jain, A., Ratha, N.K. (eds.) AVBPA 2005. LNCS, vol. 3546, pp. 580–587. Springer, Heidelberg (2005)

    Google Scholar 

  30. El Hannani, A., Toledano, D.T., Petrovska-Delacrétaz, D., Montero-Asenjo, A., Hennebert, J.: Using data-driven and phonetic units for speaker verification. In: Proc. of ODYSSEY06, The Speaker and Language Recognition Workshop, 28-30 June (2006)

    Google Scholar 

  31. Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1970)

    Google Scholar 

  32. Ferrer, L., Sönmez, K., Kajarekar, S.: Class-dependent score combination for speaker recognition. In: Proc. Interspeech (September 2005)

    Google Scholar 

  33. Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech and Signal Processing 29(2), 254–272 (1981)

    Article  Google Scholar 

  34. Furui, S.: Comparison of speaker recognition methods using static features and dynamic features. IEEE Transactions on Acoustics, Speech and Signal Processing 29(3), 342–350 (1981)

    Article  Google Scholar 

  35. Garcia-Romero, D., Fierrez-Aguilar, J., Ortega-Garcia, J., Gonzalez-Rodriguez, J.: Support vector machine fusion of idiolectal and acoustic speaker information in spanish conversational speech. In: Proc. ICASSP (April 2003)

    Google Scholar 

  36. Gauvain, J.L., Lee, C.-H.: Maximum a posteriori estimation for multivariate gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process 29, 291–298 (1994)

    Article  Google Scholar 

  37. Godfrey, J., Graff, D., Martin, A.: Public databases for speaker recognition and verification. In: ESCA Workshop on Automatic Speaker Recognition Identification and Verification, April 1994, pp. 39–42 (1994)

    Google Scholar 

  38. Gutman, D., Bistritz, Y.: Speaker verification using phoneme-adapted gaussian mixture models. In: Proc. EUSIPCO (2002)

    Google Scholar 

  39. Hansen, E.G., Slyh, R.E., Anderson, T.R.: Speaker recognition using phoneme-specific GMMs. In: Proc. Odyssey (2004)

    Google Scholar 

  40. Hatch, A.O., Peskin, B., Stolcke, A.: Improved phonetic speaker recognition using lattice decoding. In: Proc. ICASSP (March 2005)

    Google Scholar 

  41. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan, New York (1994)

    MATH  Google Scholar 

  42. Hébert, M., Heck, L.P.: Phonetic class-based speaker verification. In: Proc. Eurospeech (2003)

    Google Scholar 

  43. Hermansky, H.: Perceptual linear prediction (plp) analysis of speech. Journal of the Acoustical Society of America 87(4) (1990)

    Google Scholar 

  44. Hermansky, H.: Rasta processing of speech. IEEE Trans. on Speech and Audio Processing 2(4) (1994)

    Google Scholar 

  45. Kajarekar, S.S., Hermansky, H.: Speaker verification based on broad phonetic categories. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)

    Google Scholar 

  46. Kenny, P., Dumouchel, P.: Experiments in speaker verification using factor analysis likelihood ratios. In: Proceedings of Odyssey04 - Speaker and Language Recognition Workshop, May 31 - June 3 (2004)

    Google Scholar 

  47. Kenny, P., Dumouchel, P.: Disentangling speaker and channel effects in speaker verification. In: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2004), vol. 1 (May 2005)

    Google Scholar 

  48. Kharroubi, J., Petrovska-Delacréraz, D., Chollet, G.: Combining GMM’s with support vector machines for text-independent speaker verification. In: Proc. Eurospeech, pp. 1761–1764 (2001)

    Google Scholar 

  49. Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239 (1998)

    Article  Google Scholar 

  50. Klusacek, D., Navratil, J., Reynolds, D.A., Campbell, J.: Conditional pronunciation modeling in speaker detection. In: Proc. ICASSP (April 2003)

    Google Scholar 

  51. Koolwaaij, J., Boves, L.: Local normalization and delayed decision making in speaker detection and tracking. Digital Signal Processing 10 (2000)

    Google Scholar 

  52. Koolwaaij, J., de Veth, J.: The use of broad phonetic class models in speaker recognition. In: Proc. ICSLP (1998)

    Google Scholar 

  53. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Comput. Speech Language 9, 171–185 (1995)

    Article  Google Scholar 

  54. Li, K.-P., Porter, J.E.: Normalizations and selection of speech segments for speaker recognition scoring. In: Proc. ICASSP, vol. 1, pp. 595–598 (1988)

    Google Scholar 

  55. Lindberg, J., Koolwaaij, J., Hutter, H., Genoud, D., Blomberg, M., Bimbot, F., Pierrot, J.: Techniques for a priori decision threshold estimation in speaker verification. In: Speaker Verification Proceedings RLA2C, Avignon (1998)

    Google Scholar 

  56. Ma, C., Chang, E.: Comparaison of discriminative training methods for speaker verification. In: Proc. ICASSP (April 2003)

    Google Scholar 

  57. Magrin-Chagnolleau, I., Gravier, G., Blouet, R.: Overview of the 2000-2001 elisa consortium research activities. In: Speaker Odyssey Workshop (June 2001)

    Google Scholar 

  58. Makhoul, J.: Linear prediction: A tutorial review. Proceedings of the IEEE 63(4), 561–580 (1975)

    Article  Google Scholar 

  59. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The det curve in assessment of detection task performance. In: Proc. Eurospeech’97, vol. 4, pp. 1895–1898 (1997)

    Google Scholar 

  60. Matsui, T., Furui, S.: Concatenated phoneme models for text-variable speaker recognition. In: Proc. ICASSP, pp. 133–136 (1993)

    Google Scholar 

  61. Naik, J.M., Lubensky, D.: A hybrid hmm-mlp speaker verification algorithm for telephone speech. In: Proc. ICASSP (1994)

    Google Scholar 

  62. Naik, J.M.: Speaker verification: A tutorial. IEEE Commun. Magazine, 42–48 (Jan. 1990)

    Google Scholar 

  63. Navratil, J., Ramaswamy, G.N.: The awe and mystery of t-norm. In: Proc. Eurospeech’03 (2003)

    Google Scholar 

  64. Nordström, T., Melin, H., Lindberg, J.: A comparative study of speaker verification systems using the polycost database. In: Proc. ICSLP (December 1998)

    Google Scholar 

  65. Olsen, J.: A two-stage procedure for phone based speaker verification. In: Bigün, J., Borgefors, G., Chollet, G. (eds.) AVBPA 1997. LNCS, vol. 1206, pp. 219–226. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  66. Oppenheim, A.V., Schafer, R.W.: Homomorphic analysis of speech. IEEE Transactions on Audio and Electroacoustics 16(2), 221–226 (1968)

    Article  Google Scholar 

  67. Oppenheim, A.V., Schafer, R.W.: Discrete-Time Signal Processing. Prentice-Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  68. Parris, E.S., Carey, M.J.: Discriminative phonemes for speaker identification. In: ICLSP, pp. 1843–1846 (1994)

    Google Scholar 

  69. Pelecanos, J., Sridharan, S.: Feature warping for robust speaker verification. In: 2001: A Speaker Odyssey - The Speaker Recognition Workshop (June 2001)

    Google Scholar 

  70. Petrovska-Delacrétaz, D., Abalo, M., El-Hannani, A., Chollet, G.: Data-driven speech segmentation for speaker verification and language identification. In: ITRW Non Linear Speech Processing (NOLISP 03), May 20-23 (2003)

    Google Scholar 

  71. Petrovska-Delacrétaz, D., Gorin, A.L., Wright, J.H., Riccardi, G.: Detecting acoustic morphemes in lattices for speken landuage understanding. In: ICSLP (October 2000)

    Google Scholar 

  72. Petrovska-Delacretaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLP’s. In: Proc. ICASSP, pp. 777–780 (1998)

    Google Scholar 

  73. Petrovska-Delacrétaz, D., Černocký, J., Chollet, G.: Segmental approaches for automatic speaker verification. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 198–212 (2000)

    Google Scholar 

  74. Petrovska-Delacrétaz, D., Černocký, J., Hennebert, J., Chollet, G.: Text-independent speaker verification using automatically labeled acoustic segments. In: ICLSP (1998)

    Google Scholar 

  75. Picone, J.: Signal modeling techniques in speech recognition. Proceedings of the IEEE 81(9), 1214–1247 (1993)

    Article  Google Scholar 

  76. Preti, A., Scheffer, N., Bonastre, J.-F.: Discriminant approches for gmm based speaker detection systems. In: The workshop on Multimodal User Authentication (2006)

    Google Scholar 

  77. Quatieri, T.F.: Speech Signal Processing. Prentice Hall Signal Processing Series. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  78. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  79. Reynolds, D., Andrews, W., Campbell, J., Navratil, J., Peskin, B., Adami, A., Jin, Q., Klusacek, D., Abramson, J., Mihaescu, R., Godfrey, J., Jones, J., Xiang, B.: The supersid project: Exploiting high-level information for high-accuracy speaker recognition. In: Proc. ICASSP (April 2003)

    Google Scholar 

  80. Reynolds, D.A.: A gaussian mixture modeling approach to text-independent speaker identification. Ph.D. Thesis, Georgia Institute of Technology (1992)

    Google Scholar 

  81. Reynolds, D.A.: Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing 2(3), 639–643 (1994)

    Article  Google Scholar 

  82. Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. Lincoln Lab. Journal 8(2), 173–191 (1995)

    Google Scholar 

  83. Reynolds, D.A.: Comparison of background normalization methods for text-independent speaker verification. In: Proc. Eurospeech, pp. 963–966 (1997)

    Google Scholar 

  84. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted gaussian mixture models. DSP (Special Issue on the NIST’99 evaluations) 10(1-3), 19–41 (2000)

    Google Scholar 

  85. Rosenberg, A.E., DeLong, J., Lee, C.H., Juang, B.H., Soong, F.K.: The use of cohort normalized scores for speaker verification. In: International Conference on Speech and Language Processing, November 1992, pp. 599–602 (1992)

    Google Scholar 

  86. Schmidt, M., Gish, H.: Speaker identification via support vector machines. In: Proc. ICASSP (1996)

    Google Scholar 

  87. Schölkopf, B., Smola, A.J.: Learning with kernels: Support vector machines, regularization, optimization and beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  88. Solewicz, Y.A., Koppel, M.: Enhanced fusion methods for speaker verification. In: Proc. SPECOM’2004: 9th Conference Speech and Computer, pp. 388–392 (2004)

    Google Scholar 

  89. Solomonoff, A., Quillen, C., Campbell, W.: Channel compensation for svm speaker recognition. In: Proc. Odyssey (2004)

    Google Scholar 

  90. Sönmez, K., Shriberg, E., Heck, L., Weintraub, M.: Modeling dynamic prosodic variation for speaker verification. In: Proc. ICSLP98 (1998)

    Google Scholar 

  91. Sturim, D., Reynolds, D., Dunn, R., Quatieri, T.: Speaker verification using text-constrained gaussian mixture models. In: Proc. ICASSP, vol. 1, pp. 677–680 (2002)

    Google Scholar 

  92. Tishby, N.: On the application of mixture ar hidden markov models to text independent speaker recognition. IEEE Transactions on Signial Processing 39(3), 563–570 (1991)

    Article  Google Scholar 

  93. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  94. Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)

    Article  Google Scholar 

  95. Wan, V., Renals, S.: SVMSVM: Support vector machine speaker verification methodology. In: Proc. IEEE ICASSP, vol. 2, pp. 221–224 (2003)

    Google Scholar 

  96. Wan, V., Renals, S.: Speaker verification using sequence discriminant support vector machines. IEEE Trans. on Speech and Audio Processing 13, 203–210 (2005)

    Article  Google Scholar 

  97. Xiang, B., Chaudhari, U., Navratil, J., Ramaswamy, G., Gopinath, R.: Short-time gaussianization for robust speaker verification. In: Proc. ICASSP, vol. 1, pp. 681–684 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yannis Stylianou Marcos Faundez-Zanuy Anna Esposito

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Petrovska-Delacrétaz, D., El Hannani, A., Chollet, G. (2007). Text-Independent Speaker Verification: State of the Art and Challenges. In: Stylianou, Y., Faundez-Zanuy, M., Esposito, A. (eds) Progress in Nonlinear Speech Processing. Lecture Notes in Computer Science, vol 4391. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71505-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71505-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71503-0

  • Online ISBN: 978-3-540-71505-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics