Text-Independent Speaker Recognition

Reynolds, Douglas A.; Campbell, William M.

doi:10.1007/978-3-540-49127-9_38

Douglas A. Reynolds Dr.⁴ &
William M. Campbell Dr.⁵

Part of the book series: Springer Handbooks ((SHB))

8071 Accesses
17 Citations

Abstract

In this chapter, we focus on the area of text-independent speaker verification, with an emphasis on unconstrained telephone conversational speech. We begin by providing a general likelihood ratio detection task framework to describe the various components in modern text-independent speaker verification systems. We next describe the general hierarchy of speaker information conveyed in the speech signal and the issues involved in reliably exploiting these levels of information for practical speaker verification systems. We then describe specific implementations of state-of-the-art text-independent speaker verification systems utilizing low-level spectral information and high-level token sequence information with generative and discriminative modeling techniques. Finally, we provide a performance assessment of these systems using the National Institute of Standards and Technology (NIST) speaker recognition evaluation telephone corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 579.00; Price excludes VAT (USA)

Hardcover Book: USD 729.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

BBN:: Bolt, Beranek and Newman
CMS:: cepstral mean subtraction
DET:: detection error tradeoff
EER:: equal error rate
EM:: estimate-maximize
EM:: expectation maximization
FFT:: fast Fourier transform
GLDS:: generalized linear discriminant sequence
GMM:: Gaussian mixture model
GSV:: GMM supervector
HMM:: hidden Markov models
KL:: Kullbach-Leibler
LDC:: Linguistic Data Consortium
LLR:: (log) likelihood ratio
LPC:: linear predictive coding
MAP:: maximum a posteriori
MLLR:: maximum-likelihood linear regression
RASTA:: relative spectra
SMS:: speaker model synthesis
SVM:: support vector machines
TFIDF:: term frequency inverse document frequency
TFLLR:: term frequency log-likelihood ratio
TFLOG:: term frequency logarithmic
UBM:: universal background model

References

J. Naik, G. Doddington: Evaluation of a high performance speaker verification system for access Control, Proc. ICASSP (1987) pp. 2392-2395
Google Scholar
A. Higgins, L. Bahler, J. Porter: Speaker verification using randomized phrase prompting, Digital Signal Process. 1, 89-106 (1991)
Article Google Scholar
J. Naik, L. Netsch, G. Doddington: Speaker verification over long distance telephone lines, Proc. ICASSP (1989) pp. 524-527
Google Scholar
C. Schmandt, B. Arons: A conversational telephone messaging system, IEEE Trans. Consumer Electron. 30(3), xxi-xxiv (1984)
Article Google Scholar
L. Wilcox, F. Chen, D. Kimber, V. Balasubramanian: Segmentation of speech using speaker identification, Proc. ICASSP (1994) pp. I.161-I.164
Google Scholar
B.M. Arons: Interactively Skimming Recorded Speech, Ph.D. Thesis (MIT Press, Cambridge 1994)
Google Scholar
G. Doddington: Speaker recognition - identifying people by their voices, Proc. IEEE 73(11), 1651-1664 (1985)
Article Google Scholar
R.B. Dunn, D.A. Reynolds, T.F. Quatieri: Approaches to speaker detection and tracking in multi-speaker audio, Digital Signal Process. 10(1), 93-112 (2000)
Article Google Scholar
K. Li, J. Porter: Normalizations and selection of speech segments for speaker recognition scoring, Proc. ICASSP (1988) pp. 595-598
Google Scholar
D.A. Reynolds, T.F. Quatieri, R.B. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10(1), 19-41 (2000)
Article Google Scholar
R. Auckenthaler, M. Carey, H. Lloyd-Thomas: Score Normalization for Text-Independent Speaker Verification Systems, Digital Signal Process. 10, 42-54 (2000)
Article Google Scholar
W.M. Campbell, D.A. Reynolds, J.P. Campbell, K. Brady: Estimating and evaluating confidence for forensic speaker recognition, Proc. ICASSP (2005) pp. 18-23
Google Scholar
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin, A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey, D. Jones, B. Xiang: The SuperSID project: Exploiting high-level information for high-accuracy speaker recognition, Proc. ICASSP (2003) pp. 784-787
Google Scholar
F. Bimbot, J.-F. Bonastre, C. Fredouille, G. Gravier, I. Magrin-Chagnolleau, S. Meignier, T. Merlin, J. Ortega-Garcia, D. Petrovska-Delacretaz, D.A. Reynolds: A tutorial on text-independent speaker verification, EURASIP J. Appl. Signal Process. 4, 430-451 (2004)
Article Google Scholar
F. Soong, A. Rosenberg: On the use of instantaneous and transitional spectral information in speaker recognition, Proc. ICASSP (1986) pp. 877-880
Google Scholar
H. Hermansky, N. Morgan, A. Bayya, P. Kohn: RASTA-PLP speech analysis technique, Proc. ICASSP (1992) pp. I.121-I.124
Google Scholar
D.A. Reynolds: Channel robust speaker verification via feature mapping, Proc. ICASSP (2003) pp. I.53-I.56
Google Scholar
L. Nguyen, S. Matsoukas, J. Davenport, F. Kubala, R. Schwartz, J. Makhoul: Progress in transcription of broadcast news using Byblos, Speech Commun. 38(1-2), 213-230 (2002)
Article MATH Google Scholar
J.P. Campbell, D.A. Reynolds, R.B. Dunn: Fusing high- and low-level features for speaker recognition, Proc. European Conf. Speech Commun. Technol. (2003) pp. 2665-2668
Google Scholar
W.M. Campbell, J.P. Campbell, D.A. Reynolds, D.A. Jones, T.R. Leek: High-level speaker verification with support vector machines, Proc. ICASSP (2004) pp. I.73-I.76
Google Scholar
G. Doddington: Speaker recognition based on idiolectal differences between speakers, Proc. European Conf. Speech Commun. Technol. (2001)
Google Scholar
A. Dempster, N. Laird, D. Rubin: Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. 39, 1-38 (1977)
MathSciNet MATH Google Scholar
D.A. Reynolds: Speaker identification and verification using gaussian mixture speaker models, Speech Commun. 17(1-2), 91-108 (1995)
Article Google Scholar
D.A. Reynolds: Automatic speaker recognition using Gaussian mixture speaker models, Lincoln Lab. J. 8(2), 173-192 (1995)
Google Scholar
A.E. Rosenberg, J. DeLong, C.H. Lee, B.H. Juang, F.K. Soong: The use of cohort normalized scores for speaker verification, Int. Conf. Speech Lang. Process. (1992) pp. 599-602
Google Scholar
M. Carey, E. Parris, J. Bridle: A speaker verification system using alphanets, Proc. ICASSP (1991) pp. 397-400
Google Scholar
D.A. Reynolds: Comparison of background normalization methods for text-independent speaker verification, Proc. European Conf. Speech Commun. Technol. (1997) pp. 963-967
Google Scholar
J.L. Gauvain, C.-H. Lee: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Process. 2(2), 291-298 (1994)
Article Google Scholar
S. Vuuren: Speaker Verification in a Time-Feature Space, Ph.D. Thesis (OGI, Beaverton 1999)
Google Scholar
D.A. Reynolds, T.F. Quatieri, R. Dunn: Speaker verification using adapted Gaussian mixture models, Digital Signal Process. 10(1-3), 19-41 (2000)
Article Google Scholar
R. Teunen, B. Shahshahani, L. Heck: A model-based transformational approach to robust speaker recognition, Proc. Int. Conf. Spoken Lang. Process. (2000)
Google Scholar
P. Kenny, G. Boulianne, P. Dumouchel: Eigenvoice modeling with sparse training data, IEEE Trans. Speech Audio Process. 13(3), 345-354 (2005)
Article Google Scholar
R. Vogt, B. Baker, S. Sriharan: Modelling session variability in text-independent speaker verification, Proc. Interspeech (2005) pp. 3117-3120
Google Scholar
C. Vair, D. Colibro, F. Castaldo, E. Dalmasso, P. Laface: Channel factors compensation in model and feature domain for speaker recognition, Proc. Odyssey Speaker and Language Workshop (2006)
Google Scholar
N. Cristianini, J. Shawe-Taylor: Support Vector Machines (Cambridge Univ. Press, Cambridge 2000)
Book MATH Google Scholar
M. Schmidt, H. Gish: Speaker identification and verification using Gaussian mixture speaker models, Proc. ICASSP (1996) pp. 105-108
Google Scholar
V. Wan, W.M. Campbell: Support vector machines for verification and identification, neural networks for signal processing X, Proc. 2000 IEEE Signal Process. Workshop (2000) pp. 775-784
Google Scholar
A. Ganapathiraju, J. Picone: Hybrid SVM/HMM architectures for speech recognition, Speech Transcription Workshop (2000)
Google Scholar
J.C. Platt: Probabilities for SV machines. In: Advances in Large Margin Classifiers, ed. by A.J. Smola, P.L. Bartlett, B. Schölkopf, D. Schuurmans (MIT Press, Cambridge 2000) pp. 61-74
Google Scholar
W.M. Campbell: Generalized linear discriminant sequence kernels for speaker recognition, Proc. ICASSP (2002) pp. 161-164
Google Scholar
S. Fine, J. Navrátil, R.A. Gopinath: A hybrid GMM/SVM approach to speaker recognition, Proc. ICASSP (2001)
Google Scholar
V. Wan, S. Renals: SVMSVM: support vector machine speaker verification methodology, Proc. ICASSP (2003) pp. 221-224
Google Scholar
A. Stolcke, L. Ferrer, S. Kajarekar, E. Shriberg, A. Venkataraman: MLLR transforms as features in speaker recognition, Proc. European Conf. Speech Commun. Technol. (2005) pp. 2425-2428
Google Scholar
W.M. Campbell, D.E. Sturim, D.A. Reynolds: Support vector machines using GMM supervectors for speaker verification, IEEE Sign. Process. Lett. 13(5), 308-311 (2006)
Article Google Scholar
R. Collobert, S. Bengio: SVMTorch: Support vector machines for large-scale regression problems, J. Machine Learn. Res. 1, 143-160 (2001)
MathSciNet MATH Google Scholar
J. Louradour, K. Daoudi, F. Bach: SVM speaker verification using an incomplete Cholesky decomposition sequence kernel, IEEE 2006 Odyssey: The Speaker and Language Recognition Workshop (2006)
Google Scholar
P.J. Moreno, P. Ho, N. Vasconcelos: A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing 16, ed. by S. Thrun, L.K. Saul, B. Schölkopf (MIT Press, Cambridge 2004)
Google Scholar
T.S. Jaakkola, D. Haussler: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing 11, ed. by M.S. Kearns, S.A. Solla, D.A. Cohn (MIT Press, Cambridge 1998) pp. 487-493
Google Scholar
A. Solomonoff, W.M. Campbell, I. Boardman: Advances in channel compensation for SVM speaker recognition, Proc. ICASSP (2005)
Google Scholar
W.D. Andrews, M.A. Kohler, J.P. Campbell, J.J. Godfrey, J. Hernández-Cordero: Gender-dependent phonetic refraction for speaker recognition, Proc. ICASSP (2002) pp. I.149-I.153
Google Scholar
T. Joachims: Learning to Classify Text Using Support Vector Machines (Kluwer Academic, Dordrecht 2002)
Book Google Scholar
W.M. Campbell, D.A. Reynolds, J.P. Campbell: Fusing discriminative and generative methods for speaker recogntion: experiments on Switchboard and NFI/TNO field data, Proc. Odyssey Speaker and Language Workshop (2004) pp. 41-44
Google Scholar
L. Kukolich, R. Lippman: LNKnet Userʼs Guide Manual and software available online athttp://www.ll.mit.edu/IST/lnknet (2004)
The 2006 NIST Speaker Recognition Evaluation Plan http://www.nist.gov/speech/tests/spk/2006/sre-06_evalplan-v9.pdf (2006)
M.A. Pryzbocki, A.F. Martin, A.N. Le: NIST speaker recognition evaluation chronicles part 2, Proc. Odyssey Speaker and Language Workshop (2006)
Google Scholar
J.P. Campbell, H. Nakasone, C. Cieri, D. Miller, K. Walker, A.F. Martin, M.A. Pryzbocki: The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation, Proc. Odyssey Speaker and Language Workshop (2004) pp. 29-32
Google Scholar
D.E. Sturim, W.M. Campbell, D.A. Reynolds, R.B. Dunn, T.F. Quatieri: Robust speaker recognition with cross-channel data: MIT-LL results on the 2006 NIST SRE auxiliary microphone task, Proc. ICASSP (2007)
Google Scholar
A. Martin, G. Doddington, T. Kamm, M. Ordowski, M. Pryzbocki: The DET curve in assessment of detection task performance, Proc. European Conf. Speech Commun. Technol. (1997) pp. 1895-1898
Google Scholar
W.M. Campbell, D.E. Sturim, W. Shen, D.A. Reynolds, J. Navratil: The MIT-LL/IBM 2006 speaker recognition system: High-performance reduced-complexity recognition, Proc. ICASSP (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Lincoln Laboratory, Information Systems Technology Group, Massachusetts Institute of Technology, 244 Wood Street, 02420-9108, Lexington, MA, USA
Douglas A. Reynolds Dr.
Information Systems Technology Group, MIT Lincoln Laboratory, 244 Wood Street, 02420-9108, Lexington, MA, USA
William M. Campbell Dr.

Authors

Douglas A. Reynolds Dr.
View author publications
You can also search for this author in PubMed Google Scholar
William M. Campbell Dr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Douglas A. Reynolds Dr. or William M. Campbell Dr. .

Editor information

Editors and Affiliations

INRS-EMT, University of Quebec, 800 de la Gauchetiere Ouest, H5A 1K6, Montreal, Quebec, Canada
Jacob Benesty Dr.
Avayalabs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
M. Mohan Sondhi Ph.D.
Alcatel-Lucent, Bell Laboratories, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Yiteng Arden Huang Dr.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reynolds, D.A., Campbell, W.M. (2008). Text-Independent Speaker Recognition. In: Benesty, J., Sondhi, M.M., Huang, Y.A. (eds) Springer Handbook of Speech Processing. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49127-9_38

Download citation

DOI: https://doi.org/10.1007/978-3-540-49127-9_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49125-5
Online ISBN: 978-3-540-49127-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics