Skip to main content

Text-Independent Speaker Recognition

  • Chapter
Springer Handbook of Speech Processing

Part of the book series: Springer Handbooks ((SHB))

Abstract

In this chapter, we focus on the area of text-independent speaker verification, with an emphasis on unconstrained telephone conversational speech. We begin by providing a general likelihood ratio detection task framework to describe the various components in modern text-independent speaker verification systems. We next describe the general hierarchy of speaker information conveyed in the speech signal and the issues involved in reliably exploiting these levels of information for practical speaker verification systems. We then describe specific implementations of state-of-the-art text-independent speaker verification systems utilizing low-level spectral information and high-level token sequence information with generative and discriminative modeling techniques. Finally, we provide a performance assessment of these systems using the National Institute of Standards and Technology (NIST) speaker recognition evaluation telephone corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 579.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 729.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BBN:

Bolt, Beranek and Newman

CMS:

cepstral mean subtraction

DET:

detection error tradeoff

EER:

equal error rate

EM:

estimate-maximize

EM:

expectation maximization

FFT:

fast Fourier transform

GLDS:

generalized linear discriminant sequence

GMM:

Gaussian mixture model

GSV:

GMM supervector

HMM:

hidden Markov models

KL:

Kullbach-Leibler

LDC:

Linguistic Data Consortium

LLR:

(log) likelihood ratio

LPC:

linear predictive coding

MAP:

maximum a posteriori

MLLR:

maximum-likelihood linear regression

RASTA:

relative spectra

SMS:

speaker model synthesis

SVM:

support vector machines

TFIDF:

term frequency inverse document frequency

TFLLR:

term frequency log-likelihood ratio

TFLOG:

term frequency logarithmic

UBM:

universal background model

References

  1. J. Naik, G. Doddington: Evaluation of a high performance speaker verification system for access Control, Proc. ICASSP (1987) pp. 2392-2395

    Google Scholar 

  2. A. Higgins, L. Bahler, J. Porter: Speaker verification using randomized phrase prompting, Digital Signal Process. 1, 89-106 (1991)

    Article  Google Scholar 

  3. J. Naik, L. Netsch, G. Doddington: Speaker verification over long distance telephone lines, Proc. ICASSP (1989) pp. 524-527

    Google Scholar 

  4. C. Schmandt, B. Arons: A conversational telephone messaging system, IEEE Trans. Consumer Electron. 30(3), xxi-xxiv (1984)

    Article  Google Scholar 

  5. L. Wilcox, F. Chen, D. Kimber, V. Balasubramanian: Segmentation of speech using speaker identification, Proc. ICASSP (1994) pp. I.161-I.164

    Google Scholar 

  6. B.M. Arons: Interactively Skimming Recorded Speech, Ph.D. Thesis (MIT Press, Cambridge 1994)

    Google Scholar 

  7. G. Doddington: Speaker recognition - identifying people by their voices, Proc. IEEE 73(11), 1651-1664 (1985)

    Article  Google Scholar 

  8. R.B. Dunn, D.A. Reynolds, T.F. Quatieri: Approaches to speaker detection and tracking in multi-speaker audio, Digital Signal Process. 10(1), 93-112 (2000)

    Article  Google Scholar 

  9. K. Li, J. Porter: Normalizations and selection of speech segments for speaker recognition scoring, Proc. ICASSP (1988) pp. 595-598

    Google Scholar 

  10. D.A. Reynolds, T.F. Quatieri, R.B. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10(1), 19-41 (2000)

    Article  Google Scholar 

  11. R. Auckenthaler, M. Carey, H. Lloyd-Thomas: Score Normalization for Text-Independent Speaker Verification Systems, Digital Signal Process. 10, 42-54 (2000)

    Article  Google Scholar 

  12. W.M. Campbell, D.A. Reynolds, J.P. Campbell, K. Brady: Estimating and evaluating confidence for forensic speaker recognition, Proc. ICASSP (2005) pp. 18-23

    Google Scholar 

  13. D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition, Proc. ICASSP (2003) pp. 784-787

    Google Scholar 

  14. F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacretaz, D.A. Reynolds: A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4, 430-451 (2004)

    Article  Google Scholar 

  15. F. Soong, A. Rosenberg: On the use of instantaneous and transitional spectral information in speaker recognition, Proc. ICASSP (1986) pp. 877-880

    Google Scholar 

  16. H. Hermansky, N. Morgan, A. Bayya, P. Kohn: RASTA-PLP speech analysis technique, Proc. ICASSP (1992) pp. I.121-I.124

    Google Scholar 

  17. D.A. Reynolds: Channel robust speaker verification via feature mapping, Proc. ICASSP (2003) pp. I.53-I.56

    Google Scholar 

  18. L. Nguyen, S. Matsoukas, J. Davenport, F. Kubala, R. Schwartz, J. Makhoul: Progress in transcription of broadcast news using Byblos, Speech Commun. 38(1-2), 213-230 (2002)

    Article  MATH  Google Scholar 

  19. J.P. Campbell, D.A. Reynolds, R.B. Dunn: Fusing high- and low-level features for speaker recognition, Proc. European Conf. Speech Commun. Technol. (2003) pp. 2665-2668

    Google Scholar 

  20. W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek: High-level speaker verification with support vector machines, Proc. ICASSP (2004) pp. I.73-I.76

    Google Scholar 

  21. G. Doddington: Speaker recognition based on idiolectal differences between speakers, Proc. European Conf. Speech Commun. Technol. (2001)

    Google Scholar 

  22. A. Dempster, N. Laird, D. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. 39, 1-38 (1977)

    MathSciNet  MATH  Google Scholar 

  23. D.A. Reynolds: Speaker identification and verification using gaussian mixture speaker models, Speech Commun. 17(1-2), 91-108 (1995)

    Article  Google Scholar 

  24. D.A. Reynolds: Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J. 8(2), 173-192 (1995)

    Google Scholar 

  25. A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong: The use of cohort normalized scores for speaker verification, Int. Conf. Speech Lang. Process. (1992) pp. 599-602

    Google Scholar 

  26. M. Carey, E. Parris, J. Bridle: A speaker verification system using alphanets, Proc. ICASSP (1991) pp. 397-400

    Google Scholar 

  27. D.A. Reynolds: Comparison of background normalization methods for text-independent speaker verification, Proc. European Conf. Speech Commun. Technol. (1997) pp. 963-967

    Google Scholar 

  28. J.L. Gauvain, C.-H. Lee: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Process. 2(2), 291-298 (1994)

    Article  Google Scholar 

  29. S. Vuuren: Speaker Verification in a Time-Feature Space, Ph.D. Thesis (OGI, Beaverton 1999)

    Google Scholar 

  30. D.A. Reynolds, T.F. Quatieri, R. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10(1-3), 19-41 (2000)

    Article  Google Scholar 

  31. R. Teunen, B. Shahshahani, L. Heck: A model-based transformational approach to robust speaker recognition, Proc. Int. Conf. Spoken Lang. Process. (2000)

    Google Scholar 

  32. P. Kenny, G. Boulianne, P. Dumouchel: Eigenvoice modeling with sparse training data, IEEE Trans. Speech Audio Process. 13(3), 345-354 (2005)

    Article  Google Scholar 

  33. R. Vogt, B. Baker, S. Sriharan: Modelling session variability in text-independent speaker verification, Proc. Interspeech (2005) pp. 3117-3120

    Google Scholar 

  34. C. Vair, D. Colibro, F. Castaldo, E. Dalmasso, P. Laface: Channel factors compensation in model and feature domain for speaker recognition, Proc. Odyssey Speaker and Language Workshop (2006)

    Google Scholar 

  35. N. Cristianini, J. Shawe-Taylor: Support Vector Machines (Cambridge Univ. Press, Cambridge 2000)

    Book  MATH  Google Scholar 

  36. M. Schmidt, H. Gish: Speaker identification and verification using Gaussian mixture speaker models, Proc. ICASSP (1996) pp. 105-108

    Google Scholar 

  37. V. Wan, W.M. Campbell: Support vector machines for verification and identification, neural networks for signal processing X, Proc. 2000 IEEE Signal Process. Workshop (2000) pp. 775-784

    Google Scholar 

  38. A. Ganapathiraju, J. Picone: Hybrid SVM/HMM architectures for speech recognition, Speech Transcription Workshop (2000)

    Google Scholar 

  39. J.C. Platt: Probabilities for SV machines. In: Advances in Large Margin Classifiers, ed. by A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (MIT Press, Cambridge 2000) pp. 61-74

    Google Scholar 

  40. W.M. Campbell: Generalized linear discriminant sequence kernels for speaker recognition, Proc. ICASSP (2002) pp. 161-164

    Google Scholar 

  41. S. Fine, J. Navrátil, R.A. Gopinath: A hybrid GMM/SVM approach to speaker recognition, Proc. ICASSP (2001)

    Google Scholar 

  42. V. Wan, S. Renals: SVMSVM: support vector machine speaker verification methodology, Proc. ICASSP (2003) pp. 221-224

    Google Scholar 

  43. A. Stolcke, L. Ferrer, S. Kajarekar, E. Shriberg, A. Venkataraman: MLLR transforms as features in speaker recognition, Proc. European Conf. Speech Commun. Technol. (2005) pp. 2425-2428

    Google Scholar 

  44. W.M. Campbell, D.E. Sturim, D.A. Reynolds: Support vector machines using GMM supervectors for speaker verification, IEEE Sign. Process. Lett. 13(5), 308-311 (2006)

    Article  Google Scholar 

  45. R. Collobert, S. Bengio: SVMTorch: Support vector machines for large-scale regression problems, J. Machine Learn. Res. 1, 143-160 (2001)

    MathSciNet  MATH  Google Scholar 

  46. J. Louradour, K. Daoudi, F. Bach: SVM speaker verification using an incomplete Cholesky decomposition sequence kernel, IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)

    Google Scholar 

  47. P.J. Moreno, P. Ho, N. Vasconcelos: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing 16, ed. by S. Thrun, L.K. Saul, B. Schölkopf (MIT Press, Cambridge 2004)

    Google Scholar 

  48. T.S. Jaakkola, D. Haussler: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing 11, ed. by M.S. Kearns, S.A. Solla, D.A. Cohn (MIT Press, Cambridge 1998) pp. 487-493

    Google Scholar 

  49. A. Solomonoff, W.M. Campbell, I. Boardman: Advances in channel compensation for SVM speaker recognition, Proc. ICASSP (2005)

    Google Scholar 

  50. W.D. Andrews, M.A. Kohler, J.P. Campbell, J.J. Godfrey, J. Hernández-Cordero: Gender-dependent phonetic refraction for speaker recognition, Proc. ICASSP (2002) pp. I.149-I.153

    Google Scholar 

  51. T. Joachims: Learning to Classify Text Using Support Vector Machines (Kluwer Academic, Dordrecht 2002)

    Book  Google Scholar 

  52. W.M. Campbell, D.A. Reynolds, J.P. Campbell: Fusing discriminative and generative methods for speaker recogntion: experiments on Switchboard and NFI/TNO field data, Proc. Odyssey Speaker and Language Workshop (2004) pp. 41-44

    Google Scholar 

  53. L. Kukolich, R. Lippman: LNKnet Userʼs Guide Manual and software available online athttp://www.ll.mit.edu/IST/lnknet (2004)

  54. The 2006 NIST Speaker Recognition Evaluation Plan http://www.nist.gov/speech/tests/spk/2006/sre-06_evalplan-v9.pdf (2006)

  55. M.A. Pryzbocki, A.F. Martin, A.N. Le: NIST speaker recognition evaluation chronicles part 2, Proc. Odyssey Speaker and Language Workshop (2006)

    Google Scholar 

  56. J.P. Campbell, H. Nakasone, C. Cieri, D. Miller, K. Walker, A.F. Martin, M.A. Pryzbocki: The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation, Proc. Odyssey Speaker and Language Workshop (2004) pp. 29-32

    Google Scholar 

  57. D.E. Sturim, W.M. Campbell, D.A. Reynolds, R.B. Dunn, T.F. Quatieri: Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task, Proc. ICASSP (2007)

    Google Scholar 

  58. A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Pryzbocki: The DET curve in assessment of detection task performance, Proc. European Conf. Speech Commun. Technol. (1997) pp. 1895-1898

    Google Scholar 

  59. W.M. Campbell, D.E. Sturim, W. Shen, D.A. Reynolds, J. Navratil: The MIT-LL/IBM 2006 speaker recognition system: High-performance reduced-complexity recognition, Proc. ICASSP (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Douglas A. Reynolds Dr. or William M. Campbell Dr. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Reynolds, D.A., Campbell, W.M. (2008). Text-Independent Speaker Recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-49127-9_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49125-5

  • Online ISBN: 978-3-540-49127-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics